We provide a comprehensive reply to the comment written by Stefan Boettcher
[arXiv:2210.00623] and argue that the comment singles out one particular
non-representative example problem, entirely focusing on the maximum cut
problem (MaxCut) on sparse graphs, for which greedy algorithms are expected to
perform well. Conversely, we highlight the broader algorithmic development
underlying our original work, and (within our original framework) provide
additional numerical results showing sizable improvements over our original
data, thereby refuting the comment's original performance statements.
Furthermore, it has already been shown that physics-inspired graph neural
networks (PI-GNNs) can outperform greedy algorithms, in particular on hard,
dense instances. We also argue that the internal (parallel) anatomy of graph
neural networks is very different from the (sequential) nature of greedy
algorithms, and (based on their usage at the scale of real-world social
networks) point out that graph neural networks have demonstrated their
potential for superior scalability compared to existing heuristics such as
extremal optimization. Finally, we conclude highlighting the conceptual novelty
of our work and outline some potential extensions.
( 3
min )
The constitutive behavior of polymeric materials is often modeled by finite
linear viscoelastic (FLV) or quasi-linear viscoelastic (QLV) models. These
popular models are simplifications that typically cannot accurately capture the
nonlinear viscoelastic behavior of materials. For example, the success of
attempts to capture strain rate-dependent behavior has been limited so far. To
overcome this problem, we introduce viscoelastic Constitutive Artificial Neural
Networks (vCANNs), a novel physics-informed machine learning framework for
anisotropic nonlinear viscoelasticity at finite strains. vCANNs rely on the
concept of generalized Maxwell models enhanced with nonlinear strain
(rate)-dependent properties represented by neural networks. The flexibility of
vCANNs enables them to automatically identify accurate and sparse constitutive
models of a broad range of materials. To test vCANNs, we trained them on
stress-strain data from Polyvinyl Butyral, the electro-active polymers VHB 4910
and 4905, and a biological tissue, the rectus abdominis muscle. Different
loading conditions were considered, including relaxation tests, cyclic
tension-compression tests, and blast loads. We demonstrate that vCANNs can
learn to capture the behavior of all these materials accurately and
computationally efficiently without human guidance.
( 2
min )
We propose learning a depth covariance function with applications to
geometric vision tasks. Given RGB images as input, the covariance function can
be flexibly used to define priors over depth functions, predictive
distributions given observations, and methods for active point selection. We
leverage these techniques for a selection of downstream tasks: depth
completion, bundle adjustment, and monocular dense visual odometry.
( 2
min )
While end-to-end learning systems are rapidly gaining capabilities and
popularity, the increasing computational demands for deploying such systems,
along with a lack of flexibility, adaptability, explainability, reasoning and
verification capabilities, require new types of architectures. Here we
introduce a classification of hybrid systems which, based on an analysis of
human knowledge and intelligence, combines neural learning with various types
of knowledge and knowledge sources. We present the Thrill-K architecture as a
prototypical solution for integrating instantaneous knowledge, standby
knowledge and external knowledge sources in a framework capable of inference,
learning and intelligent control.
( 2
min )
This paper investigates the universal approximation capabilities of
Hamiltonian Deep Neural Networks (HDNNs) that arise from the discretization of
Hamiltonian Neural Ordinary Differential Equations. Recently, it has been shown
that HDNNs enjoy, by design, non-vanishing gradients, which provide numerical
stability during training. However, although HDNNs have demonstrated
state-of-the-art performance in several applications, a comprehensive study to
quantify their expressivity is missing. In this regard, we provide a universal
approximation theorem for HDNNs and prove that a portion of the flow of HDNNs
can approximate arbitrary well any continuous function over a compact domain.
This result provides a solid theoretical foundation for the practical use of
HDNNs.
( 2
min )
Head MRI pre-processing involves converting raw images to an
intensity-normalized, skull-stripped brain in a standard coordinate space. In
this paper, we propose an end-to-end weakly supervised learning approach,
called Neural Pre-processing (NPP), for solving all three sub-tasks
simultaneously via a neural network, trained on a large dataset without
individual sub-task supervision. Because the overall objective is highly
under-constrained, we explicitly disentangle geometric-preserving intensity
mapping (skull-stripping and intensity normalization) and spatial
transformation (spatial normalization). Quantitative results show that our
model outperforms state-of-the-art methods which tackle only a single sub-task.
Our ablation experiments demonstrate the importance of the architecture design
we chose for NPP. Furthermore, NPP affords the user the flexibility to control
each of these tasks at inference time. The code and model are freely-available
at \url{https://github.com/Novestars/Neural-Pre-processing}.
( 2
min )
Ensembles based on k nearest neighbours (kNN) combine a large number of base
learners, each constructed on a sample taken from a given training data.
Typical kNN based ensembles determine the k closest observations in the
training data bounded to a test sample point by a spherical region to predict
its class. In this paper, a novel random projection extended neighbourhood rule
(RPExNRule) ensemble is proposed where bootstrap samples from the given
training data are randomly projected into lower dimensions for additional
randomness in the base models and to preserve features information. It uses the
extended neighbourhood rule (ExNRule) to fit kNN as base learners on randomly
projected bootstrap samples.
( 2
min )
submitted by /u/sardoa11
[link] [comments]
( 41
min )
Hello there!
Serge chat UI, with conversations on the left
I've recently been working on Serge, a self-hosted dockerized way of running LLaMa models with a decent UI & stored conversations. It currently supports Alpaca 7B, 13B and 30B and we're working on integrating it with LangChain and the ReAct chain agent.
I've tried my best at making the instructions dead easy, so it's all dockerized with a download manager for weights and it can be run with almost zero configuration required.
I think being able to run those models locally will be key to expanding their ability, and so I hope this can contribute to that.
Let me know if you have any feedback or suggestions on how to extend its capabilities!
GitHub: https://github.com/nsarrazin/serge
submitted by /u/SensitiveCranberry
[link] [comments]
( 47
min )
submitted by /u/Chipdoc
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/TheSlickGecko
[link] [comments]
( 41
min )
submitted by /u/GamesAndGlasses
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/Arun4033622
[link] [comments]
( 41
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 41
min )
submitted by /u/Emilie_Tunc
[link] [comments]
( 41
min )
submitted by /u/abstractcontrol
[link] [comments]
( 41
min )
Game developer CD PROJEKT RED today at the Game Developers Conference in San Francisco unveiled a technology preview for Cyberpunk 2077 with path tracing, coming April 11. Path tracing, also known as full ray tracing, accurately simulates light throughout an entire scene. It’s used by visual effects artists to create film and TV graphics that Read article >
( 5
min )
Gamers wanted better graphics. GPUs delivered. Those GPUs became the key to the world-changing AI revolution. Now gamers are reaping the benefits. At GDC 2023 in San Francisco this week, the gaming industry’s premier developers conference, NVIDIA made a series of announcements, including new games and game development tools that promise to accelerate innovations at Read article >
( 6
min )
Like old friends catching up over coffee, two industry icons reflected on how modern AI got its start, where it’s at today and where it needs to go next. Jensen Huang, founder and CEO of NVIDIA, interviewed AI pioneer Ilya Sutskever in a fireside chat at GTC. The talk was recorded a day after the Read article >
( 6
min )
Building AI applications is hard. Putting them to use across a business can be even harder. Less than one-third of enterprises that have begun adopting AI actually have it in production, according to a recent IDC survey. Businesses often realize the full complexity of operationalizing AI just prior to launching an application. Problems discovered so Read article >
( 7
min )
With Amazon Rekognition Custom Labels, you can have Amazon Rekognition train a custom model for object detection or image classification specific to your business needs. For example, Rekognition Custom Labels can find your logo in social media posts, identify your products on store shelves, classify machine parts in an assembly line, distinguish healthy and infected […]
( 7
min )
There has been a paradigm change in the mindshare of education customers who are now willing to explore new technologies and analytics. Universities and other higher learning institutions have collected massive amounts of data over the years, and now they are exploring options to use that data for deeper insights and better educational outcomes. You […]
( 7
min )
In this post, we show how to configure a new OAuth-based authentication feature for using Snowflake in Amazon SageMaker Data Wrangler. Snowflake is a cloud data platform that provides data solutions for data warehousing to data science. Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and […]
( 12
min )
Announcements GPT-4: Chatbots and Data Prep Ain’t What They Used To Be With the recent launch of OpenAI’s GPT-4, Google Bard and Anthropic’s Claude, reporters on the AI beat this week got to compare and contrast three prominent large language model (LLM) chatbot approaches. Their initial conclusion seems to be that all three have comparable… Read More »DSC Weekly 22 March 2023 – GPT-4: Chatbots and Data Prep Ain’t What They Used To Be
The post DSC Weekly 22 March 2023 – GPT-4: Chatbots and Data Prep Ain’t What They Used To Be appeared first on Data Science Central.
( 20
min )
Deductive domains are typical of many cognitive skills in that no single
problem-solving strategy is always optimal for solving all problems. It was
shown that students who know how and when to use each strategy (StrTime)
outperformed those who know neither and stick to the default strategy
(Default). In this work, students were trained on a logic tutor that supports a
default forward-chaining and a backward-chaining (BC) strategy, then a
probability tutor that only supports BC. We investigated three types of
interventions on teaching the Default students how and when to use which
strategy on the logic tutor: Example, Nudge and Presented. Meanwhile, StrTime
students received no interventions. Overall, our results show that Nudge
outperformed their Default peers and caught up with StrTime on both tutors.
( 2
min )
Grokking is a phenomenon where a model trained on an algorithmic task first
overfits but, then, after a large amount of additional training, undergoes a
phase transition to generalize perfectly. We empirically study the internal
structure of networks undergoing grokking on the sparse parity task, and find
that the grokking phase transition corresponds to the emergence of a sparse
subnetwork that dominates model predictions. On an optimization level, we find
that this subnetwork arises when a small subset of neurons undergoes rapid norm
growth, whereas the other neurons in the network decay slowly in norm. Thus, we
suggest that the grokking phase transition can be understood to emerge from
competition of two largely distinct subnetworks: a dense one that dominates
before the transition and generalizes poorly, and a sparse one that dominates
afterwards.
( 2
min )
Due to the rapid dynamics and a mass of uncertainties in the quantitative
markets, the issue of how to take appropriate actions to make profits in stock
trading remains a challenging one. Reinforcement learning (RL), as a
reward-oriented approach for optimal control, has emerged as a promising method
to tackle this strategic decision-making problem in such a complex financial
scenario. In this paper, we integrated two prior financial trading strategies
named constant proportion portfolio insurance (CPPI) and time-invariant
portfolio protection (TIPP) into multi-agent deep deterministic policy gradient
(MADDPG) and proposed two specifically designed multi-agent RL (MARL) methods:
CPPI-MADDPG and TIPP-MADDPG for investigating strategic trading in quantitative
markets. Afterward, we selected 100 different shares in the real financial
market to test these specifically proposed approaches. The experiment results
show that CPPI-MADDPG and TIPP-MADDPG approaches generally outperform the
conventional ones.
( 2
min )
This work introduces the notion of intermediate concepts based on levels
structure to aid explainability for black-box models. The levels structure is a
hierarchical structure in which each level corresponds to features of a dataset
(i.e., a player-set partition). The level of coarseness increases from the
trivial set, which only comprises singletons, to the set, which only contains
the grand coalition. In addition, it is possible to establish meronomies, i.e.,
part-whole relationships, via a domain expert that can be utilised to generate
explanations at an abstract level. We illustrate the usability of this approach
in a real-world car model example and the Titanic dataset, where intermediate
concepts aid in explainability at different levels of abstraction.
( 2
min )
In this work we present a deep learning approach to conduct hypothesis-free,
transcriptomics-based matching of drugs for diseases. Our proposed neural
network architecture is trained on approved drug-disease indications, taking as
input the relevant disease and drug differential gene expression profiles, and
learns to identify novel indications. We assemble an evaluation dataset of
disease-drug indications spanning 68 diseases and evaluate in silico our
approach against the most widely used transcriptomics-based matching baselines,
CMap and the Characteristic Direction. Our results show a more than 200%
improvement over both baselines in terms of standard retrieval metrics. We
further showcase our model's ability to capture different genes' expressions
interactions among drugs and diseases. We provide our trained models, data and
code to predict with them at https://github.com/healx/dgem-nn-public.
( 2
min )
Using deep learning methods to classify EEG signals can accurately identify
people's emotions. However, existing studies have rarely considered the
application of the information in another domain's representations to feature
selection in the time-frequency domain. We propose a classification network of
EEG signals based on the cross-domain feature fusion method, which makes the
network more focused on the features most related to brain activities and
thinking changes by using the multi-domain attention mechanism. In addition, we
propose a two-step fusion method and apply these methods to the EEG emotion
recognition network. Experimental results show that our proposed network, which
combines multiple representations in the time-frequency domain and spatial
domain, outperforms previous methods on public datasets and achieves
state-of-the-art at present.
( 2
min )
Biological neural networks continue to inspire breakthroughs in neural
network performance. And yet, one key area of neural computation that has been
under-appreciated and under-investigated is biologically plausible,
energy-efficient spiking neural networks, whose potential is especially
attractive for low-power, mobile, or otherwise hardware-constrained settings.
We present a literature review of recent developments in the interpretation,
optimization, efficiency, and accuracy of spiking neural networks. Key
contributions include identification, discussion, and comparison of
cutting-edge methods in spiking neural network optimization, energy-efficiency,
and evaluation, starting from first principles so as to be accessible to new
practitioners.
( 2
min )
Continual learning is a problem for artificial neural networks that their
biological counterparts are adept at solving. Building on work using Sparse
Distributed Memory (SDM) to connect a core neural circuit with the powerful
Transformer model, we create a modified Multi-Layered Perceptron (MLP) that is
a strong continual learner. We find that every component of our MLP variant
translated from biology is necessary for continual learning. Our solution is
also free from any memory replay or task information, and introduces novel
methods to train sparse networks that may be broadly applicable.
( 2
min )
Riemannian submanifold optimization with momentum is computationally
challenging because ensuring iterates remain on the submanifold often requires
solving difficult differential equations. We simplify such optimization
algorithms for the submanifold of symmetric positive-definite matrices with the
affine invariant metric. We propose a generalized version of the Riemannian
normal coordinates which dynamically trivializes the problem into a Euclidean
unconstrained problem. We use our approach to explain and simplify existing
approaches for structured covariances and develop efficient second-order
optimizers for deep learning without explicit matrix inverses.
( 2
min )
Federated Learning (FL) is a collaborative machine learning (ML) framework
that combines on-device training and server-based aggregation to train a common
ML model among distributed agents. In this work, we propose an asynchronous FL
design with periodic aggregation to tackle the straggler issue in FL systems.
Considering limited wireless communication resources, we investigate the effect
of different scheduling policies and aggregation designs on the convergence
performance. Driven by the importance of reducing the bias and variance of the
aggregated model updates, we propose a scheduling policy that jointly considers
the channel quality and training data representation of user devices. The
effectiveness of our channel-aware data-importance-based scheduling policy,
compared with state-of-the-art methods proposed for synchronous FL, is
validated through simulations. Moreover, we show that an ``age-aware''
aggregation weighting design can significantly improve the learning performance
in an asynchronous FL setting.
( 2
min )
This article presents the DeepSense 6G dataset, which is a large-scale
dataset based on real-world measurements of co-existing multi-modal sensing and
communication data. The DeepSense 6G dataset is built to advance deep learning
research in a wide range of applications in the intersection of multi-modal
sensing, communication, and positioning. This article provides a detailed
overview of the DeepSense dataset structure, adopted testbeds, data collection
and processing methodology, deployment scenarios, and example applications,
with the objective of facilitating the adoption and reproducibility of
multi-modal sensing and communication datasets.
( 2
min )
Building accurate Deep Learning (DL) models for brain age prediction is a
very relevant topic in neuroimaging, as it could help better understand
neurodegenerative disorders and find new biomarkers. To estimate accurate and
generalizable models, large datasets have been collected, which are often
multi-site and multi-scanner. This large heterogeneity negatively affects the
generalization performance of DL models since they are prone to overfit
site-related noise. Recently, contrastive learning approaches have been shown
to be more robust against noise in data or labels. For this reason, we propose
a novel contrastive learning regression loss for robust brain age prediction
using MRI scans. Our method achieves state-of-the-art performance on the
OpenBHB challenge, yielding the best generalization capability and robustness
to site-related noise.
( 2
min )
In model-based reinforcement learning for safety-critical control systems, it
is important to formally certify system properties (e.g., safety, stability)
under the learned controller. However, as existing methods typically apply
formal verification \emph{after} the controller has been learned, it is
sometimes difficult to obtain any certificate, even after many iterations
between learning and verification. To address this challenge, we propose a
framework that jointly conducts reinforcement learning and formal verification
by formulating and solving a novel bilevel optimization problem, which is
differentiable by the gradients from the value function and certificates.
Experiments on a variety of examples demonstrate the significant advantages of
our framework over the model-based stochastic value gradient (SVG) method and
the model-free proximal policy optimization (PPO) method in finding feasible
controllers with barrier functions and Lyapunov functions that ensure system
safety and stability.
( 2
min )
In this paper, we present a new approach to mental state classification from
EEG signals by combining signal processing techniques and machine learning (ML)
algorithms. We evaluate the performance of the proposed method on a dataset of
EEG recordings collected during a cognitive load task and compared it to other
state-of-the-art methods. The results show that the proposed method achieves
high accuracy in classifying mental states and outperforms state-of-the-art
methods in terms of classification accuracy and computational efficiency.
( 2
min )
Randomized neural networks (randomized NNs), where only the terminal layer's
weights are optimized constitute a powerful model class to reduce computational
time in training the neural network model. At the same time, these models
generalize surprisingly well in various regression and classification tasks. In
this paper, we give an exact macroscopic characterization (i.e., a
characterization in function space) of the generalization behavior of
randomized, shallow NNs with ReLU activation (RSNs). We show that RSNs
correspond to a generalized additive model (GAM)-typed regression in which
infinitely many directions are considered: the infinite generalized additive
model (IGAM). The IGAM is formalized as solution to an optimization problem in
function space for a specific regularization functional and a fairly general
loss. This work is an extension to multivariate NNs of prior work, where we
showed how wide RSNs with ReLU activation behave like spline regression under
certain conditions and if the input is one-dimensional.
( 2
min )
Our study focuses on determining the best weight windows for a weighted
moving average smoother under squared loss. We show that there exists an
optimal weight window that is symmetrical around its center. We study the class
of tapered weight windows, which decrease in weight as they move away from the
center. We formulate the corresponding least squares problem as a quadratic
program and finally as a projection of the origin onto a convex polytope.
Additionally, we provide some analytical solutions to the best window when some
conditions are met on the input data.
( 2
min )
This paper extends standard results from learning theory with independent
data to sequences of dependent data. Contrary to most of the literature, we do
not rely on mixing arguments or sequential measures of complexity and derive
uniform risk bounds with classical proof patterns and capacity measures. In
particular, we show that the standard classification risk bounds based on the
VC-dimension hold in the exact same form for dependent data, and further
provide Rademacher complexity-based bounds, that remain unchanged compared to
the standard results for the identically and independently distributed case.
Finally, we show how to apply these results in the context of scenario-based
optimization in order to compute the sample complexity of random programs with
dependent constraints.
( 2
min )
When considering a real log canonical threshold (RLCT) that gives a Bayesian
generalization error, in general, papers replace a mean error function with a
relatively simple polynomial whose RLCT corresponds to that of the mean error
function, and obtain its RLCT by resolving its singularities through an
algebraic operation called blow-up. Though it is known that the singularities
of any polynomial can be resolved by a finite number of blow-up iterations, it
is not clarified whether or not it is possible to resolve singularities of a
specific polynomial by applying a specific blow-up algorithm. Therefore this
paper considers the blow-up algorithm for the polynomials called
sum-of-products (sop) polynomials and its RLCT.
( 2
min )
We consider a variant of contextual bandits in which the algorithm consumes
multiple resources subject to linear constraints on total consumption. This
problem generalizes contextual bandits with knapsacks (CBwK), allowing for
packing and covering constraints, as well as positive and negative resource
consumption. We present a new algorithm that is simple, computationally
efficient, and admits vanishing regret. It is statistically optimal for CBwK
when an algorithm must stop once some constraint is violated. Our algorithm
builds on LagrangeBwK (Immorlica et al., FOCS 2019) , a Lagrangian-based
technique for CBwK, and SquareCB (Foster and Rakhlin, ICML 2020), a
regression-based technique for contextual bandits. Our analysis leverages the
inherent modularity of both techniques.
( 2
min )
Lattice gauge equivariant convolutional neural networks (L-CNNs) are a
framework for convolutional neural networks that can be applied to non-Abelian
lattice gauge theories without violating gauge symmetry. We demonstrate how
L-CNNs can be equipped with global group equivariance. This allows us to extend
the formulation to be equivariant not just under translations but under global
lattice symmetries such as rotations and reflections. Additionally, we provide
a geometric formulation of L-CNNs and show how convolutions in L-CNNs arise as
a special case of gauge equivariant neural networks on SU($N$) principal
bundles.
( 2
min )
There are many open-source projects and indie-built demos around the GPT-4 API. Despite the recent shift of OpenAI toward closure, open demos are always advancing the field and inspiring creativity. Here are some community projects that I find particularly interesting: https://github.com/radi-cho/awesome-gpt4. Feel free to share the things you've been building or something you've been fascinated about on social media either by joining the discussion here or by contributing to the repository:)
submitted by /u/radi-cho
[link] [comments]
( 43
min )
https://medium.com/coiled-computing/save-money-with-spot-d499edd46ae7
submitted by /u/dask-jeeves
[link] [comments]
( 43
min )
audioflux is a deep learning tool library for audio and music analysis, feature extraction. It supports dozens of time-frequency analysis transformation methods and hundreds of corresponding time-domain and frequency-domain feature combinations. It can be provided to deep learning networks for training, and is used to study various tasks in the audio field such as Classification, Separation, Music Information Retrieval(MIR) and ASR etc.
Source Code: https://github.com/libAudioFlux/audioFlux
submitted by /u/Leo_D517
[link] [comments]
( 47
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/manhesh
[link] [comments]
( 41
min )
submitted by /u/WaffleHouseBaby
[link] [comments]
( 49
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/Powamotogang
[link] [comments]
( 41
min )
submitted by /u/RideFuture
[link] [comments]
( 41
min )
submitted by /u/Ok-Craft-9908
[link] [comments]
( 42
min )
submitted by /u/PuppetHere
[link] [comments]
( 41
min )
submitted by /u/GaylordTurner
[link] [comments]
( 41
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 41
min )
submitted by /u/sharkymcstevenson2
[link] [comments]
( 43
min )
submitted by /u/Tao_Dragon
[link] [comments]
( 42
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 41
min )
Llama + Alpaca-13b + 64 COARS | ./Release/chat -t 120 -m ggml-alpaca-13b-q4 - YouTube
Alpaca.cpp demo https://github.com/antimatter15/alpaca.cpp
submitted by /u/APUsilicon
[link] [comments]
( 42
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
I have been working on a very interesting project that aims to create an ensemble of models for a range of tasks in the Meta-ML domain. As someone who has had limited exposure to others interested in AI and has recently started exploring the field, I've made considerable progress on my own. However, I'm reaching out to find people with diverse perspectives and backgrounds who might be interested in joining me.
The project involves designing models, developing workflows, and identifying data sources. Despite my relatively short time in the AI realm, I've come up with some novel approaches, such as custom hyperparameter tuning systems and convolutional layering methods, that I believe will help improve the models' ability to learn relationships in clean data while also allowing them to func…
( 45
min )
Deforestation is a major concern in many tropical geographies where local rainforests are at severe risk of destruction. About 17% of the Amazon rainforest has been destroyed over the past 50 years, and some tropical ecosystems are approaching a tipping point beyond which recovery is unlikely. A key driver for deforestation is raw material extraction […]
( 11
min )
Amazon SageMaker customers can view and manage their quota limits through Service Quotas. In addition, they can view near real-time utilization metrics and create Amazon CloudWatch metrics to view and programmatically query SageMaker quotas. SageMaker helps you build, train, and deploy machine learning (ML) models with ease. To learn more, refer to Getting started with […]
( 5
min )
As organizations grow in size and scale, the complexities of running workloads increase, and the need to develop and operationalize processes and workflows becomes critical. Therefore, organizations have adopted technology best practices, including microservice architecture, MLOps, DevOps, and more, to improve delivery time, reduce defects, and increase employee productivity. This post introduces a best practice […]
( 12
min )
Companies across industries are looking to use interactive avatars to enhance digital experiences. But creating them is a complex, time-consuming process requiring state-of-the-art AI models that can see, hear, understand and communicate with end users. To ease this process, NVIDIA is providing creators and developers with real-time AI solutions through Omniverse Avatar Cloud Engine (ACE), Read article >
( 5
min )
With AI at its tipping point, AI-enabled computer vision is being used to address the world’s most challenging problems in nearly every industry. At GTC, a global conference for the era of AI and the metaverse running through Thursday, March 23, NVIDIA announced technology updates poised to drive the next wave of vision AI adoption. Read article >
( 6
min )
Powerful AI technologies are revolutionizing 3D content creation — whether by enlivening realistic characters that show emotion or turning simple texts into imagery. The brightest minds, artists and creators are gathering at NVIDIA GTC, a free, global conference on AI and the metaverse, taking place online through Thursday, March 23.
( 9
min )
The automotive industry is undergoing a digital revolution, driven by breakthroughs in accelerated computing, AI and the industrial metaverse. Automakers are digitalizing every phase of the product lifecycle — including concept and styling, design and engineering, software and electronics, smart factories, autonomous driving and retail — using the NVIDIA Omniverse platform and AI. Based on Read article >
( 7
min )
Transportation industry trailblazers are propelling their next-generation vehicles by building on NVIDIA DRIVE end-to-end solutions, which span the cloud to the car. The world’s best-selling new energy vehicle (NEV) brand BYD announced at NVIDIA GTC that it’s using the NVIDIA DRIVE Orin centralized compute platform to power an even wider range of vehicles within its Read article >
( 6
min )
Mitsui & Co., Ltd., one of Japan’s largest business conglomerates, is collaborating with NVIDIA on Tokyo-1 — an initiative to supercharge the nation’s pharmaceutical leaders with technology, including high-resolution molecular dynamics simulations and generative AI models for drug discovery. Announced today at the NVIDIA GTC global AI conference, the Tokyo-1 project features an NVIDIA DGX Read article >
( 7
min )
Digitalization that combines AI and simulation is redefining how industrial products are created and transforming how people interact with the digital world. To help enterprises tackle complex new workloads, NVIDIA has unveiled the third generation of its NVIDIA OVX computing system. OVX is designed to power large-scale digital twins built on NVIDIA Omniverse Enterprise, a Read article >
( 5
min )
Healthcare enterprises globally are working with NVIDIA to drive AI-accelerated solutions that are detecting diseases earlier from medical images, delivering critical insights to care teams and revolutionizing drug discovery workflows. NVIDIA Clara, a suite of software and services that powers AI healthcare solutions, is enabling this transformation industry-wide. The Clara ecosystem includes BioNeMo for drug Read article >
( 7
min )
Powerful AI technologies are making a massive impact in 3D content creation and game development. Whether creating realistic characters that show emotion or turning simple texts into imagery, AI tools are becoming fundamental to developer workflows — and this is just the start. At NVIDIA GTC and the Game Developers Conference (GDC), learn how the Read article >
( 7
min )
BMW Group is at the forefront of a key new manufacturing trend — going digital-first by using the virtual world to optimize layouts, robotics and logistics systems years before production really starts. The automaker announced today with NVIDIA at GTC that it’s expanding its use of the NVIDIA Omniverse platform for building and operating industrial Read article >
( 6
min )
Developers and creators can better realize the massive potential of generative AI, simulation and the industrial metaverse with new Omniverse Connectors and other updates to NVIDIA Omniverse, a platform for creating and operating metaverse applications. Omniverse Cloud, a platform-as-a-service unveiled today at NVIDIA GTC, equips users with a range of simulation and generative AI capabilities Read article >
( 7
min )
NVIDIA announced today at GTC that Omniverse Cloud will be hosted on Microsoft Azure, increasing access to Isaac Sim, the company’s platform for developing and managing AI-based robots. The company also said that a full lineup of Jetson Orin modules is now available, offering a performance leap for edge AI and robotics applications. “The world’s Read article >
( 6
min )
CCC Intelligent Solutions (CCC) has become the first company in the auto insurance industry to deliver an AI-powered repair estimating solution, called CCC Estimate – STP, short for straight-through processing. The Chicago-based auto-claims technology powerhouse uses AI, insurer-driven rules and CCC’s vast ecosystem to deliver repair estimates in seconds, instead of days. It’s a technological Read article >
( 6
min )
As a sports commentator for a professional lacrosse team, Grant Farhall knows the value in having the right teammates. As the chief product officer for Getty Images, a global visual-content creator and marketplace, he believes the collaboration between his company and NVIDIA is an excellent pairing for taking generative AI to the next level. The Read article >
( 5
min )
Large language models available today are incredibly knowledgeable, but act like time capsules — the information they capture is limited to the data available when they were first trained. If trained a year ago, for example, an LLM powering an enterprise’s AI chatbot won’t know about the latest products and services at the business. With Read article >
( 6
min )
The results are in, and they point to a new era in energy-efficient computing. In tests of real workloads, the NVIDIA Grace CPU Superchip scored 2x performance gains over x86 processors at the same power envelope across major data center CPU applications. That opens up a whole new set of opportunities. It means data centers Read article >
( 6
min )
Microsoft, Tencent and Baidu are adopting NVIDIA CV-CUDA for computer vision AI. NVIDIA CEO Jensen Huang highlighted work in content understanding, visual search and deep learning Tuesday as he announced the beta release for NVIDIA’s CV-CUDA — an open-source, GPU-accelerated library for computer vision at cloud scale. “Eighty percent of internet traffic is video, user-generated Read article >
( 6
min )
submitted by /u/HastyNationality
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/freshthreadshop
[link] [comments]
( 41
min )
submitted by /u/kizumada
[link] [comments]
( 41
min )
submitted by /u/justLV
[link] [comments]
( 42
min )
submitted by /u/fiishyfiishy
[link] [comments]
( 41
min )
submitted by /u/Peaking_AI
[link] [comments]
( 41
min )
submitted by /u/adititalksai
[link] [comments]
( 41
min )
submitted by /u/CyberKaliyugiNepali
[link] [comments]
( 44
min )
submitted by /u/smorga
[link] [comments]
( 41
min )
submitted by /u/1024cities
[link] [comments]
( 41
min )
submitted by /u/----bubba----
[link] [comments]
( 41
min )
submitted by /u/geepytee
[link] [comments]
( 41
min )
submitted by /u/Xerxos
[link] [comments]
( 41
min )
OpenAssistant bot is live on /r/ask_open_assistant. There are some limitations to the reddit bot; you can also try on the model in chat mode at https://huggingface.co/spaces/olivierdehaene/chat-llm-streaming. Model is available for free download at https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b.
Prompt it by creating a new text post (responds to text body of post), starting a comment with !OpenAssistant, or by replying directly to it.
submitted by /u/pixiegirl417
[link] [comments]
( 44
min )
How to fine-tune Facebooks 30 billion parameter LLaMa on the Alpaca data set.
Blog post: https://abuqader.substack.com/p/releasing-alpaca-30b
Weights: https://huggingface.co/baseten/alpaca-30b
submitted by /u/imgonnarelph
[link] [comments]
( 45
min )
submitted by /u/nlprogress
[link] [comments]
( 43
min )
This is a simple wrapper that introduces any imaginable complex context to each question submitted to Open AI API. The main goal is to enhance the accuracy obtained in its answers in a TRANSPARENT way to end users.
https://github.com/citiususc/Smarty-GPT
submitted by /u/usc-ur
[link] [comments]
( 43
min )
I have followed this YouTube tutorial that can be run with my environment, but it seems relatively basic (for context, the game in the video is quite simple while the environment I am using is like a more complex chess)
I have heard of DDQN and other improvements to DQN, but was wondering if there is anything within basic DQN (like maybe stuff to do with the network) that can be tuned to produce better results
(Thanks for taking the time to look at this)
submitted by /u/PainisPingas
[link] [comments]
( 42
min )
This is a guest post co-written with Antony Vance from Intel. Customers are always looking for ways to improve the performance and response times of their machine learning (ML) inference workloads without increasing the cost per transaction and without sacrificing the accuracy of the results. Running ML workloads on Amazon SageMaker running Amazon Elastic Compute […]
( 8
min )
We thought of listing down some tactics that can help you tackle digital challenges smartly. Check out the blog to know what we are talking about.
The post Tackling the Evolving Tech Landscape the Smarter Way appeared first on Data Science Central.
( 21
min )
It’s easy to think of LLMs (large language models) as just ‘hallucinating’ or mere generators of text. A glorified LSTM so to speak. While there are some limitations of LLMs (and indeed they are evolving), a far more interesting question to explore is: How can LLMs be used in enterprise applications? In many ways, enterprise… Read More »Enterprise use cases for GPT-3: How to chat with your own data
The post Enterprise use cases for GPT-3: How to chat with your own data appeared first on Data Science Central.
( 19
min )
AI technologies like ChatGPT are necessitating a fundamental overhaul of our educational systems and institutions. Getting the right answers to predetermined tests is no longer sufficient in an age where AI can access, integrate, and recite knowledge billions if not trillions of times faster than the human mind. So, what are the skills, capabilities, and… Read More »Future of Education: Application not Regurgitation of Knowledge – Part II
The post Future of Education: Application not Regurgitation of Knowledge – Part II appeared first on Data Science Central.
( 23
min )
The E-commerce industry has been at the forefront of the transformation in the era of technology — it has reshaped everything from how customers shop to how the whole e-commerce things operate. Technology has led to significant changes in the e-commerce and retail industries over the past few years. Consumers now have more access to… Read More »E-commerce in 2023 — Top 5 Tech Trends that will Reshape the Industry
The post E-commerce in 2023 — Top 5 Tech Trends that will Reshape the Industry appeared first on Data Science Central.
( 26
min )
This is a simple wrapper that introduces any imaginable complex context to each question submitted to Open AI API. The main goal is to enhance the accuracy obtained in its answers in a TRANSPARENT way to end users.
https://github.com/citiususc/Smarty-GPT
submitted by /u/usc-ur
[link] [comments]
( 41
min )
We propose a certainty-equivalence scheme for adaptive control of scalar
linear systems subject to additive, i.i.d. Gaussian disturbances and bounded
control input constraints, without requiring prior knowledge of the bounds of
the system parameters, nor the control direction. Assuming that the system is
at-worst marginally stable, mean square boundedness of the closed-loop system
states is proven. Lastly, numerical examples are presented to illustrate our
results.
( 2
min )
Great success has been achieved in the 6-DoF grasp learning from the point
cloud input, yet the computational cost due to the point set orderlessness
remains a concern. Alternatively, we explore the grasp generation from the
RGB-D input in this paper. The proposed solution, Keypoint-GraspNet, detects
the projection of the gripper keypoints in the image space and then recover the
SE(3) poses with a PnP algorithm. A synthetic dataset based on the primitive
shape and the grasp family is constructed to examine our idea. Metric-based
evaluation reveals that our method outperforms the baselines in terms of the
grasp proposal accuracy, diversity, and the time cost. Finally, robot
experiments show high success rate, demonstrating the potential of the idea in
the real-world applications.
( 2
min )
We revisit the standard formulation of tabular actor-critic algorithm as a
two time-scale stochastic approximation with value function computed on a
faster time-scale and policy computed on a slower time-scale. This emulates
policy iteration. We begin by observing that reversal of the time scales will
in fact emulate value iteration and is a legitimate algorithm. We provide a
proof of convergence and compare the two empirically with and without function
approximation (with both linear and nonlinear function approximators) and
observe that our proposed critic-actor algorithm performs on par with
actor-critic in terms of both accuracy and computational effort.
( 2
min )
We examine the problem of regret minimization when the learner is involved in
a continuous game with other optimizing agents: in this case, if all players
follow a no-regret algorithm, it is possible to achieve significantly lower
regret relative to fully adversarial environments. We study this problem in the
context of variationally stable games (a class of continuous games which
includes all convex-concave and monotone games), and when the players only have
access to noisy estimates of their individual payoff gradients. If the noise is
additive, the game-theoretic and purely adversarial settings enjoy similar
regret guarantees; however, if the noise is multiplicative, we show that the
learners can, in fact, achieve constant regret. We achieve this faster rate via
an optimistic gradient scheme with learning rate separation -- that is, the
method's extrapolation and update steps are tuned to different schedules,
depending on the noise profile. Subsequently, to eliminate the need for
delicate hyperparameter tuning, we propose a fully adaptive method that attains
nearly the same guarantees as its non-adapted counterpart, while operating
without knowledge of either the game or of the noise profile.
( 3
min )
Alzheimer's Disease (AD) is a progressive neurodegenerative disease and the
leading cause of dementia. Early diagnosis is critical for patients to benefit
from potential intervention and treatment. The retina has been hypothesized as
a diagnostic site for AD detection owing to its anatomical connection with the
brain. Developed AI models for this purpose have yet to provide a rational
explanation about the decision and neither infer the stage of disease's
progression. Along this direction, we propose a novel model-agnostic
explainable-AI framework, called Granular Neuron-level Explainer (LAVA), an
interpretation prototype that probes into intermediate layers of the
Convolutional Neural Network (CNN) models to assess the AD continuum directly
from the retinal imaging without longitudinal or clinical evaluation. This
method is applied to validate the retinal vasculature as a biomarker and
diagnostic modality for Alzheimer's Disease (AD) evaluation. UK Biobank
cognitive tests and vascular morphological features suggest LAVA shows strong
promise and effectiveness in identifying AD stages across the progression
continuum.
( 2
min )
Performing classification on noisy, crowdsourced image datasets can prove
challenging even for the best neural networks. Two issues which complicate the
problem on such datasets are class imbalance and ground-truth uncertainty in
labeling. The AL-ALL and AL-PUB datasets -- consisting of tightly cropped,
individual characters from images of ancient Greek papyri -- are strongly
affected by both issues. The application of ensemble modeling to such datasets
can help identify images where the ground-truth is questionable and quantify
the trustworthiness of those samples. As such, we apply stacked generalization
consisting of nearly identical ResNets with different loss functions: one
utilizing sparse cross-entropy (CXE) and the other Kullback-Liebler Divergence
(KLD). Both networks use labels drawn from the crowdsourced consensus. For the
second network, the KLD is calculated with respect to the proposed Normalized
Distribution of Annotations (NDA). For our ensemble model, we apply a k-nearest
neighbors model to the outputs of the CXE and KLD networks. Individually, the
ResNet models have approximately 93% accuracy, while the ensemble model
achieves an accuracy of > 95%. We also perform an analysis of the Shannon
entropy of the various models' output distributions to measure classification
uncertainty. Our results suggest that entropy is useful for predicting model
misclassifications.
( 3
min )
This paper proposes an extension of regression trees by quadratic
unconstrained binary optimization (QUBO). Regression trees are very popular
prediction models that are trainable with tabular datasets, but their accuracy
is insufficient because the decision rules are too simple. The proposed method
extends the decision rules in decision trees to multi-dimensional boundaries.
Such an extension is generally unimplementable because of computational
limitations, however, the proposed method transforms the training process to
QUBO, which enables an annealing machine to solve this problem.
( 2
min )
Predictive modelling is often reduced to finding the best model that
optimizes a selected performance measure. But what if the second-best model
describes the data equally well but in a completely different way? What about
the third? Is it possible that the most effective models learn completely
different relationships in the data? Inspired by Anscombe's quartet, this paper
introduces Rashomon's quartet, a synthetic dataset for which four models from
different classes have practically identical predictive performance. However,
their visualization reveals drastically distinct ways of understanding the
correlation structure in data. The introduced simple illustrative example aims
to further facilitate visualization as a mandatory tool to compare predictive
models beyond their performance. We need to develop insightful techniques for
the explanatory analysis of model sets.
( 2
min )
Transformers achieve great performance on Visual Question Answering (VQA).
However, their systematic generalization capabilities, i.e., handling novel
combinations of known concepts, is unclear. We reveal that Neural Module
Networks (NMNs), i.e., question-specific compositions of modules that tackle a
sub-task, achieve better or similar systematic generalization performance than
the conventional Transformers, even though NMNs' modules are CNN-based. In
order to address this shortcoming of Transformers with respect to NMNs, in this
paper we investigate whether and how modularity can bring benefits to
Transformers. Namely, we introduce Transformer Module Network (TMN), a novel
NMN based on compositions of Transformer modules. TMNs achieve state-of-the-art
systematic generalization performance in three VQA datasets, improving more
than 30% over standard Transformers for novel compositions of sub-tasks. We
show that not only the module composition but also the module specialization
for each sub-task are the key of such performance gain.
( 2
min )
In this paper we provide a generalization of the concept of cohesion as
introduced recently by Berenhaut, Moore and Melvin [Proceedings of the National
Academy of Sciences, 119 (4) (2022)]. The formulation presented builds on the
technique of partitioned local depth by distilling two key probabilistic
concepts: local relevance and support division. Earlier results are extended
within the new context, and examples of applications to revealing communities
in data with uncertainty are included.
( 2
min )
Tabular question answering (TQA) presents a challenging setting for neural
systems by requiring joint reasoning of natural language with large amounts of
semi-structured data. Unlike humans who use programmatic tools like filters to
transform data before processing, language models in TQA process tables
directly, resulting in information loss as table size increases. In this paper
we propose ToolWriter to generate query specific programs and detect when to
apply them to transform tables and align them with the TQA model's
capabilities. Focusing ToolWriter to generate row-filtering tools improves the
state-of-the-art for WikiTableQuestions and WikiSQL with the most performance
gained on long tables. By investigating headroom, our work highlights the
broader potential for programmatic tools combined with neural components to
manipulate large amounts of structured data.
( 2
min )
Many natural language processing tasks benefit from long inputs, but
processing long documents with Transformers is expensive -- not only due to
quadratic attention complexity but also from applying feedforward and
projection layers to every token. However, not all tokens are equally
important, especially for longer documents. We propose CoLT5, a long-input
Transformer model that builds on this intuition by employing conditional
computation, devoting more resources to important tokens in both feedforward
and attention layers. We show that CoLT5 achieves stronger performance than
LongT5 with much faster training and inference, achieving SOTA on the
long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably
make use of extremely long inputs, showing strong gains up to 64k input length.
( 2
min )
This paper describes our participation in the shared task of hate speech
detection, which is one of the subtasks of the CERIST NLP Challenge 2022. Our
experiments evaluate the performance of six transformer models and their
combination using 2 ensemble approaches. The best results on the training set,
in a five-fold cross validation scenario, were obtained by using the ensemble
approach based on the majority vote. The evaluation of this approach on the
test set resulted in an F1-score of 0.60 and an Accuracy of 0.86.
( 2
min )
Many imaging inverse problems$\unicode{x2014}$such as image-dependent
in-painting and dehazing$\unicode{x2014}$are challenging because their forward
models are unknown or depend on unknown latent parameters. While one can solve
such problems by training a neural network with vast quantities of paired
training data, such paired training data is often unavailable. In this paper,
we propose a generalized framework for training image reconstruction networks
when paired training data is scarce. In particular, we demonstrate the ability
of image denoising algorithms and, by extension, denoising diffusion models to
supervise network training in the absence of paired training data.
( 2
min )
This paper presents the winning system for the zero-shot Spanish framing
detection task, which also achieves competitive places in eight additional
languages. The challenge of the framing detection task lies in identifying a
set of 14 frames when only a few or zero samples are available, i.e., a
multilingual multi-label few- or zero-shot setting. Our developed solution
employs a pre-training procedure based on multilingual Transformers using a
label-aware contrastive loss function. In addition to describing the system, we
perform an embedding space analysis and ablation study to demonstrate how our
pre-training procedure supports framing detection to advance computational
framing analysis.
( 2
min )
Collective motion is an ubiquitous phenomenon in nature, inspiring engineers,
physicists and mathematicians to develop mathematical models and bio-inspired
designs. Collective motion at small to medium group sizes ($\sim$10-1000
individuals, also called the `mesoscale'), can show nontrivial features due to
stochasticity. Therefore, characterizing both the deterministic and stochastic
aspects of the dynamics is crucial in the study of mesoscale collective
phenomena. Here, we use a physics-inspired, neural-network based approach to
characterize the stochastic group dynamics of interacting individuals, through
a stochastic differential equation (SDE) that governs the collective dynamics
of the group. We apply this technique on both synthetic and real-world
datasets, and identify the deterministic and stochastic aspects of the dynamics
using drift and diffusion fields, enabling us to make novel inferences about
the nature of order in these systems.
( 2
min )
Significant advancements in type 1 diabetes treatment have been made in the
development of state-of-the-art Artificial Pancreas Systems (APS). However,
lapses currently exist in the timely treatment of unsafe blood glucose (BG)
levels, especially in the case of rebound hyperglycemia. We propose a machine
learning (ML) method for predictive BG scenario categorization that outputs
messages alerting the patient to upcoming BG trends to allow for earlier,
educated treatment. In addition to standard notifications of predicted
hypoglycemia and hyperglycemia, we introduce BG scenario-specific alert
messages and the preliminary steps toward precise basal suggestions for the
prevention of rebound hyperglycemia. Experimental evaluation on the DCLP3
clinical dataset achieves >98% accuracy and >79% precision for predicting
rebound high events for patient alerts.
( 2
min )
Projection-based model order reduction on nonlinear manifolds has been
recently proposed for problems with slowly decaying Kolmogorov n-width such as
advection-dominated ones. These methods often use neural networks for manifold
learning and showcase improved accuracy over traditional linear
subspace-reduced order models. A disadvantage of the previously proposed
methods is the potential high computational costs of training the networks on
high-fidelity solution snapshots. In this work, we propose and analyze a novel
method that overcomes this disadvantage by training a neural network only on
subsampled versions of the high-fidelity solution snapshots. This method
coupled with collocation-based hyper-reduction and Gappy-POD allows for
efficient and accurate surrogate models. We demonstrate the validity of our
approach on a 2d Burgers problem.
( 2
min )
Previously, we proposed a probabilistic data generation model represented by
an unobservable tree and a sequential updating method to calculate a posterior
distribution over a set of trees. The set is called a meta-tree. In this paper,
we propose a more efficient batch updating method.
( 2
min )
Adversarial examples are inputs to machine learning models that an attacker
has intentionally designed to confuse the model into making a mistake. Such
examples pose a serious threat to the applicability of machine-learning-based
systems, especially in life- and safety-critical domains. To address this
problem, the area of adversarial robustness investigates mechanisms behind
adversarial attacks and defenses against these attacks. This survey reviews
literature that focuses on the effects of data used by a model on the model's
adversarial robustness. It systematically identifies and summarizes the
state-of-the-art research in this area and further discusses gaps of knowledge
and promising future research directions.
( 2
min )
Multivariate networks are commonly found in real-world data-driven
applications. Uncovering and understanding the relations of interest in
multivariate networks is not a trivial task. This paper presents a visual
analytics workflow for studying multivariate networks to extract associations
between different structural and semantic characteristics of the networks
(e.g., what are the combinations of attributes largely relating to the density
of a social network?). The workflow consists of a neural-network-based learning
phase to classify the data based on the chosen input and output attributes, a
dimensionality reduction and optimization phase to produce a simplified set of
results for examination, and finally an interpreting phase conducted by the
user through an interactive visualization interface. A key part of our design
is a composite variable construction step that remodels nonlinear features
obtained by neural networks into linear features that are intuitive to
interpret. We demonstrate the capabilities of this workflow with multiple case
studies on networks derived from social media usage and also evaluate the
workflow through an expert interview.
( 2
min )
For the multivariate linear regression model with unknown covariance, the
corrected Akaike information criterion is the minimum variance unbiased
estimator of the expected Kullback--Leibler discrepancy. In this study, based
on the loss estimation framework, we show its inadmissibility as an estimator
of the Kullback--Leibler discrepancy itself, instead of the expected
Kullback--Leibler discrepancy. We provide improved estimators of the
Kullback--Leibler discrepancy that work well in reduced-rank situations and
examine their performance numerically.
( 2
min )
In this paper we provide a generalization of the concept of cohesion as
introduced recently by Berenhaut, Moore and Melvin [Proceedings of the National
Academy of Sciences, 119 (4) (2022)]. The formulation presented builds on the
technique of partitioned local depth by distilling two key probabilistic
concepts: local relevance and support division. Earlier results are extended
within the new context, and examples of applications to revealing communities
in data with uncertainty are included.
( 2
min )
The kernel-based method has been successfully applied in linear system
identification using stable kernel designs. From a Gaussian process
perspective, it automatically provides probabilistic error bounds for the
identified models from the posterior covariance, which are useful in robust and
stochastic control. However, the error bounds require knowledge of the true
hyperparameters in the kernel design and are demonstrated to be inaccurate with
estimated hyperparameters for lightly damped systems or in the presence of high
noise. In this work, we provide reliable quantification of the estimation error
when the hyperparameters are unknown. The bounds are obtained by first
constructing a high-probability set for the true hyperparameters from the
marginal likelihood function and then finding the worst-case posterior
covariance within the set. The proposed bound is proven to contain the true
model with a high probability and its validity is verified in numerical
simulation.
( 2
min )
Previously, we proposed a probabilistic data generation model represented by
an unobservable tree and a sequential updating method to calculate a posterior
distribution over a set of trees. The set is called a meta-tree. In this paper,
we propose a more efficient batch updating method.
( 2
min )
submitted by /u/fignewtgingrich
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Peaking_AI
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
Hi guys just wanted to share a new app I worked on which uses chatgpt to reccomend gift ideas based on interests and remind you of birthdays. Please let me know what you think :)
https://apps.apple.com/de/app/giftgo-gift-ideas-with-ai/id1660850886?l=en
submitted by /u/SmoresDaniel
[link] [comments]
( 41
min )
submitted by /u/northernmostroasts
[link] [comments]
( 41
min )
submitted by /u/RhythmRobber
[link] [comments]
( 56
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/lostlifon
[link] [comments]
( 50
min )
submitted by /u/HolyOtherness
[link] [comments]
( 45
min )
Recently, John Carmack suggested the creation of a "canonical list of references from a leading figure," referring to a never-released reading list given to him by Ilya Sutskever.
While there may be an undue interest in that specific list, MLR is such a big field that it's difficult to know where to start. What are the major papers that are relevant to state of the art work being done in 2023? Perhaps we may crowd-source a list here?
submitted by /u/alfredr
[link] [comments]
( 44
min )
submitted by /u/actmademewannakms
[link] [comments]
( 43
min )
When deploying ML models with FastAPI we always had to write our own serialisation code for numpy.ndarray and PIL.Image. Not only have we replaced FastAPI with up to 100x faster C-level library a couple of weeks ago, but we have also recently added support for all the fancy Pythonic types on both client and server sides.
Check it out on GitHub/Unum-Cloud/UJRPC
https://preview.redd.it/3m73l6qodpoa1.png?width=1648&format=png&auto=webp&s=975d47f7f35a6a842a3454cccb24dd92e08816e0
submitted by /u/vov_or
[link] [comments]
( 43
min )
Preliminary results give credence to some of the claims made by OpenAI regarding performance gains achieved by GPT-4 across domains. Unanswered questions remain regarding training data used and possible leakage. Tools used were Langchain and the current API endpoints (chatgpt-3.5-turbo and gpt-4).
https://twitter.com/K_Hebenstreit/status/1636789765189308416
submitted by /u/N00B1ST
[link] [comments]
( 43
min )
submitted by /u/michaelthwan_ai
[link] [comments]
( 47
min )
submitted by /u/mlejva
[link] [comments]
( 44
min )
submitted by /u/radi-cho
[link] [comments]
( 43
min )
submitted by /u/AF15A
[link] [comments]
( 41
min )
Is there a single-task, multi-scene environment using continuous action spaces? Single-task and multi-scene envs are similar to gym-super-mario-bros and CoinRun in procgen. But they are all discrete action spaces. Thank you!!!!!
submitted by /u/Substantial_Lake_236
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
submitted by /u/radi-cho
[link] [comments]
( 43
min )
submitted by /u/MysteryInc152
[link] [comments]
( 44
min )
submitted by /u/Taenk
[link] [comments]
( 43
min )
In this video, you will learn how to save your conversations with ChatGPT as PDF, PNG or JSON files. The tutorial will guide you through the simple steps to export your conversations in different formats for various purposes.
https://youtu.be/eMqLFrk_tes
submitted by /u/aeiswhatiwant
[link] [comments]
( 41
min )
In this video, you will learn how to save your conversations with ChatGPT as PDF, PNG or JSON files. The tutorial will guide you through the simple steps to export your conversations in different formats for various purposes.
https://youtu.be/eMqLFrk_tes
submitted by /u/TheQuestionStation
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/sanya-g
[link] [comments]
( 45
min )
submitted by /u/manhesh
[link] [comments]
( 41
min )
submitted by /u/RandomDude6699
[link] [comments]
( 41
min )
submitted by /u/SudoSharma
[link] [comments]
( 41
min )
submitted by /u/FT05-biggoye
[link] [comments]
( 41
min )
submitted by /u/TheFootCrew_TFC
[link] [comments]
( 41
min )
I'm struggling with the idea of actual game state, the portion of it I use in the abstracted game state, and the Markov or memorylessnes property.
The game is called the "lizard game" from this video and has a simple 3x3 grid where the agent (a lizard) starts in the bottom left and moves about, trying to maximize rewards:
+------------------+------------------+------------------+ | crickets(1)| | | +------------------+------------------+------------------+ | | bird| | +------------------+------------------+------------------+ | lizard| | crickets(5)| +------------------+------------------+------------------+
The rewards are simple:
moving into an empty spot yields -1
moving into crickets(1) yields +2
moving into crickets(5) yields +10 and terminates the episode
moving into bird y…
( 49
min )
I could never figure this part out all these years, and now that I am doing a Youtube series on it I have to make sure I understand it before I publish the next video. Here a snippet from the paper 'Regret Minimization in Games with Incomplete Information' that I am specifically referring to. In the eq 4, the policy averaging considers the probability of the player's past actions.
This is from the 2008 paper. In future papers that do Monte Carlo CFR that term disappears and the those algorithms average the policies directly without considering the player's own path probability. Why is that?
submitted by /u/abstractcontrol
[link] [comments]
( 44
min )
submitted by /u/ABDULKADER90H
[link] [comments]
( 41
min )
submitted by /u/YungMixtape2004
[link] [comments]
( 41
min )
submitted by /u/Microsis
[link] [comments]
( 41
min )
submitted by /u/redditguyjustinp
[link] [comments]
( 41
min )
submitted by /u/much_successes
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/GamesAndGlasses
[link] [comments]
( 43
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/HamletsLastLine
[link] [comments]
( 42
min )
submitted by /u/justine01923
[link] [comments]
( 44
min )
submitted by /u/Past_Captain_9058
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 41
min )
submitted by /u/Smaug117
[link] [comments]
( 41
min )
I try the "Alpaca prompt" on RWKV 14B ctx8192, and to my surprise it works out of box without any finetuning (RWKV is a 100% RNN trained on 100% Pile v1 and nothing else):
https://preview.redd.it/fciatottq7oa1.png?width=1046&format=png&auto=webp&s=f88304a77b09e367e8b9812ba4b841e028481645
You are welcome to try it in RWKV 14B Gradio (click examples below the panel):
https://huggingface.co/spaces/BlinkDL/ChatRWKV-gradio
Tips: try "Expert Response" or "Expert Long Response" or "Expert Full Response" too.
https://preview.redd.it/qo71b85vq7oa1.png?width=2516&format=png&auto=webp&s=5d4467ba4bbc9016839760b3f3873f06c8b4bc6f
ChatRWKV v2 is now using a CUDA kernel to optimize INT8 inference (23 token/s on 3090): https://github.com/BlinkDL/ChatRWKV
Upgrade to latest code and "pip install rwkv --upgrade" to 0.5.0, and set os.environ["RWKV_CUDA_ON"] = '1' in v2/chat.py to enjoy the speed.
The inference speed (and VRAM consumption) of RWKV is independent of ctxlen, because it's an RNN (note: currently the preprocessing of a long prompt takes more VRAM but that can be optimized because we can process in chunks).
Meanwhile I find the latest RWKV-4-Pile-14B-20230313-ctx8192-test1050 model can utilize a long ctx:
https://preview.redd.it/a68dw0hzq7oa1.png?width=398&format=png&auto=webp&s=80570ccc844fa31efa1282d5b2106b9986e35b5a
submitted by /u/bo_peng
[link] [comments]
( 47
min )
submitted by /u/ABDULKADER90H
[link] [comments]
( 41
min )
Organizations use messaging platforms like Microsoft Teams to bring the right people together to securely communicate with each other and collaborate to get work done. Microsoft Teams captures invaluable organizational knowledge in the form of the information that flows through it as users collaborate. However, making this knowledge easily and securely available to users can […]
( 9
min )
We tend to impute AI with human-like qualities. However, choosing to give your AI system a personality has its advantages and…
( 18
min )
As artificial intelligence (AI) continues to advance and become more pervasive in our daily lives, it is crucial that we consider the…
( 7
min )
Artificial Intelligence (AI) has transformed the way we live, work, and communicate, and it is now playing a significant role in the art…
( 7
min )
No content preview
( 1
min )
Image-to-image reconstruction problems with free or inexpensive metadata in
the form of class labels appear often in biological and medical image domains.
Existing text-guided or style-transfer image-to-image approaches do not
translate to datasets where additional information is provided as discrete
classes. We introduce and implement a model which combines image-to-image and
class-guided denoising diffusion probabilistic models. We train our model on a
real-world dataset of microscopy images used for drug discovery, with and
without incorporating metadata labels. By exploring the properties of
image-to-image diffusion with relevant labels, we show that class-guided
image-to-image diffusion can improve the meaningful content of the
reconstructed images and outperform the unguided model in useful downstream
tasks.
( 2
min )
Neural network approaches to approximate the ground state of quantum
hamiltonians require the numerical solution of a highly nonlinear optimization
problem. We introduce a statistical learning approach that makes the
optimization trivial by using kernel methods. Our scheme is an approximate
realization of the power method, where supervised learning is used to learn the
next step of the power iteration. We show that the ground state properties of
arbitrary gapped quantum hamiltonians can be reached with polynomial resources
under the assumption that the supervised learning is efficient. Using kernel
ridge regression, we provide numerical evidence that the learning assumption is
verified by applying our scheme to find the ground states of several
prototypical interacting many-body quantum systems, both in one and two
dimensions, showing the flexibility of our approach.
( 2
min )
Sequential decision making in the real world often requires finding a good
balance of conflicting objectives. In general, there exist a plethora of
Pareto-optimal policies that embody different patterns of compromises between
objectives, and it is technically challenging to obtain them exhaustively using
deep neural networks. In this work, we propose a novel multi-objective
reinforcement learning (MORL) algorithm that trains a single neural network via
policy gradient to approximately obtain the entire Pareto set in a single run
of training, without relying on linear scalarization of objectives. The
proposed method works in both continuous and discrete action spaces with no
design change of the policy network. Numerical experiments in benchmark
environments demonstrate the practicality and efficacy of our approach in
comparison to standard MORL baselines.
( 2
min )
Figuring out small molecule binding sites in target proteins, in the
resolution of either pocket or residue, is critical in many virtual and real
drug-discovery scenarios. Since it is not always easy to find such binding
sites based on domain knowledge or traditional methods, different deep learning
methods that predict binding sites out of protein structures have been
developed in recent years. Here we present a new such deep learning algorithm,
that significantly outperformed all state-of-the-art baselines in terms of the
both resolutions$\unicode{x2013}$pocket and residue. This good performance was
also demonstrated in a case study involving the protein human serum albumin and
its binding sites. Our algorithm included new ideas both in the model
architecture and in the training method. For the model architecture, it
incorporated SE(3)-invariant geometric self-attention layers that operate on
top of residue-level CNN outputs. This residue-level processing of the model
allowed a transfer learning between the two resolutions, which turned out to
significantly improve the binding pocket prediction. Moreover, we developed
novel augmentation method based on protein homology, which prevented our model
from over-fitting. Overall, we believe that our contribution to the literature
is twofold. First, we provided a new computational method for binding site
prediction that is relevant to real-world applications, as shown by the good
performance on different benchmarks and case study. Second, the novel ideas in
our method$\unicode{x2013}$the model architecture, transfer learning and the
homology augmentation$\unicode{x2013}$would serve as useful components in
future works.
( 3
min )
The secret’s out. Thanks to ChatGPT, everyone knows about the power of modern AI. To find out what’s coming next, tune in to NVIDIA founder and CEO Jensen Huang’s keynote address at NVIDIA GTC on Tuesday, March 21, at 8 a.m. Pacific. Huang will share his vision for the future of AI and how NVIDIA Read article >
( 4
min )
submitted by /u/OpenDILab
[link] [comments]
( 41
min )
submitted by /u/johnaldmilligan
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
Hello everyone. As i side project, I created a website that generated over 7,000 articles in one week, each with roughly 800 to 1000 words, all using the GPT 3.5 Turbo API in a fully automated manner. I created a Python script (also generated by the GPT) where I feed a list of topics, and it generates the content and automatically posts it on WordPress. In addition, I integrated the Google Images API to capture the image and also post it automatically. Currently, I can create around 10 posts per minute. And what about the cost? To generate these 7,000 posts with 7,000 images, I spent $40 so far!
So far, however, I don't know how Google or Bing will handle this AI-generated content and if it will affect SEO, but I'm here to check it out.
If you are interessed in how i did it and some videos, check my post: https://www.tigove.com/how/how-i-created-a-website-with-7000-post-with-chatgpt/
submitted by /u/maurimbr
[link] [comments]
( 42
min )
submitted by /u/SuspiciousPillbox
[link] [comments]
( 41
min )
submitted by /u/CeFurkan
[link] [comments]
( 41
min )
submitted by /u/MarkFulton
[link] [comments]
( 41
min )
submitted by /u/sidianmsjones
[link] [comments]
( 41
min )
https://medium.com/@wiroll/fake-news-chatbots-and-the-state-of-journalism-bf95c187e582
Basically...I (ChatGPT) wrote an op-ed with the essential hypothesis of, "let's double speeds in school zones in the name of safety" and...it got published...in a place I don't live...with no verification.
Problematic?
submitted by /u/KillBosby
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Csai
[link] [comments]
( 41
min )
submitted by /u/jaredigital62
[link] [comments]
( 48
min )
submitted by /u/DCGirl20874
[link] [comments]
( 41
min )
submitted by /u/theluk246
[link] [comments]
( 41
min )
Part 1: Understanding Zero-Shot Learning
( 12
min )
submitted by /u/MysteryInc152
[link] [comments]
( 46
min )
We release the code to reproduce the pre-training of a "Large Language Model" (T5) under a limited budget (1xA100 GPU, ~20 hours) in PyTorch. We start from the randomly initialised T5-base-v1.1 (248M parameters) model implemented in HuggingFace. Next, we pre-train it on the English subset of the C4 dataset and then fine-tune it on Super-Natural Instructions (SNI).
In ~20 hours on a single GPU, we achieve ~40 RougeL on the SNI test set, compared to ~42 RougeL of the original model available on HuggingFace Hub and pre-trained through "a combination of model and data parallelism [...] on slices of Cloud TPU Pods", each with 1024 TPUs.
Our core contribution is not the T5 model itself, which follows the HuggingFace implementation. Instead, we optimise everything else in the training pipeline to offer you a user-friendly starting template for your NLP application/research.
We are keen to hear your suggestions to improve the codebase further.
Github: https://github.com/PiotrNawrot/nanoT5
Twitter: https://twitter.com/p_nawrot/status/1636373725397520384
https://preview.redd.it/zluas7u235oa1.png?width=1152&format=png&auto=webp&s=68d413aa702b2160785a9f95e5cb00318fbfcdb4
submitted by /u/korec1234
[link] [comments]
( 44
min )
bloomz.cpp allows running inference of BLOOM-like models in pure C/C++ (inspired by llama.cpp). It supports all models that can be loaded with BloomForCausalLM.from_pretrained(). For example, you can achieve 16 tokens per second on a M1 Pro.
submitted by /u/hackerllama
[link] [comments]
( 43
min )
Hello! I read the following article about Microsoft laying off their AI Ethics team: https://www.cmswire.com/customer-experience/microsoft-cuts-ai-ethics-and-society-team-as-part-of-layoffs/
In your experience, what value do AI ethics teams add? Do they actually add useful insight, or do they serve more as a PR thing? I’ve heard conflicting anecdotes for each side. Is there anything you think AI ethics as a field can do to be more useful and to get more change? Thanks!
submitted by /u/namey-name-name
[link] [comments]
( 54
min )
An update is now available for NVIDIA Canvas, the free beta app that harnesses the power of AI to help artists quickly turn simple brushstrokes into realistic landscapes.
( 6
min )
Disney Dreamlight Valley is streaming from Steam and Epic Games Store on GeForce NOW starting today. It’s one of two new games this week that members can stream with beyond-fast performance using a GeForce NOW Ultimate membership. Game as if using a PC on any device — at up to 4K resolution and 120 frames Read article >
( 5
min )
Peter Ma was bored in his high school computer science class. So he decided to teach himself something new: how to use artificial intelligence to find alien life. That’s how he eventually became the lead author of a groundbreaking study published in Nature Astronomy. The study reveals how Ma and his co-authors used AI to Read article >
( 4
min )
submitted by /u/deeplearningperson
[link] [comments]
( 41
min )
Python comes across as an object-oriented high-level programming language with dynamic semantics that allows rapid application development. It has become a general-purpose programming language for a number of reasons. It is the ready pick for data science enthusiasts; who look forward to majoring in the field with the requisite essentials. Not just that, Python has… Read More »What Makes Python a Quick Pick for Data Analysis and Data Science?
The post What Makes Python a Quick Pick for Data Analysis and Data Science? appeared first on Data Science Central.
( 20
min )
This is a study on the potential widespread usage of alternative fuel
vehicles, linking them with the socio-economic status of the respective
consumers as well as the impact on the resulting air quality index. Research in
this area aims to leverage machine learning techniques in order to promote
appropriate policies for the proliferation of alternative fuel vehicles such as
electric vehicles with due justice to different population groups. Pearson
correlation coefficient is deployed in the modeling the relationships between
socio-economic data, air quality index and data on alternative fuel vehicles.
Linear regression is used to conduct predictive modeling on air quality index
as per the adoption of alternative fuel vehicles, based on socio-economic
factors. This work exemplifies artificial intelligence for social good.
( 2
min )
Moir\'e engineering in atomically thin van der Waals heterostructures creates
artificial quantum materials with designer properties. We solve the many-body
problem of interacting electrons confined to a moir\'e superlattice potential
minimum (the moir\'e atom) using a 2D fermionic neural network. We show that
strong Coulomb interactions in combination with the anisotropic moir\'e
potential lead to striking ``Wigner molecule" charge density distributions
observable with scanning tunneling microscopy.
( 2
min )
Diffusion models have become a popular approach for image generation and
reconstruction due to their numerous advantages. However, most diffusion-based
inverse problem-solving methods only deal with 2D images, and even recently
published 3D methods do not fully exploit the 3D distribution prior. To address
this, we propose a novel approach using two perpendicular pre-trained 2D
diffusion models to solve the 3D inverse problem. By modeling the 3D data
distribution as a product of 2D distributions sliced in different directions,
our method effectively addresses the curse of dimensionality. Our experimental
results demonstrate that our method is highly effective for 3D medical image
reconstruction tasks, including MRI Z-axis super-resolution, compressed sensing
MRI, and sparse-view CT. Our method can generate high-quality voxel volumes
suitable for medical applications.
( 2
min )
Artwork recommendation is challenging because it requires understanding how
users interact with highly subjective content, the complexity of the concepts
embedded within the artwork, and the emotional and cognitive reflections they
may trigger in users. In this paper, we focus on efficiently capturing the
elements (i.e., latent semantic relationships) of visual art for personalized
recommendation. We propose and study recommender systems based on textual and
visual feature learning techniques, as well as their combinations. We then
perform a small-scale and a large-scale user-centric evaluation of the quality
of the recommendations. Our results indicate that textual features compare
favourably with visual ones, whereas a fusion of both captures the most
suitable hidden semantic relationships for artwork recommendation. Ultimately,
this paper contributes to our understanding of how to deliver content that
suitably matches the user's interests and how they are perceived.
( 2
min )
Adversarial training (AT) methods have been found to be effective against
adversarial attacks on deep neural networks. Many variants of AT have been
proposed to improve its performance. Pang et al. [1] have recently shown that
incorporating hypersphere embedding (HE) into the existing AT procedures
enhances robustness. We observe that the existing AT procedures are not
designed for the HE framework, and thus fail to adequately learn the angular
discriminative information available in the HE framework. In this paper, we
propose integrating HE into AT with regularization terms that exploit the rich
angular information available in the HE framework. Specifically, our method,
termed angular-AT, adds regularization terms to AT that explicitly enforce
weight-feature compactness and inter-class separation; all expressed in terms
of angular features. Experimental results show that angular-AT further improves
adversarial robustness.
( 2
min )
The performance of fault diagnosis systems is highly affected by data quality
in cyber-physical power systems. These systems generate massive amounts of data
that overburden the system with excessive computational costs. Another issue is
the presence of noise in recorded measurements, which prevents building a
precise decision model. Furthermore, the diagnostic model is often provided
with a mixture of redundant measurements that may deviate it from learning
normal and fault distributions. This paper presents the effect of feature
engineering on mitigating the aforementioned challenges in cyber-physical
systems. Feature selection and dimensionality reduction methods are combined
with decision models to simulate data-driven fault diagnosis in a 118-bus power
system. A comparative study is enabled accordingly to compare several advanced
techniques in both domains. Dimensionality reduction and feature selection
methods are compared both jointly and separately. Finally, experiments are
concluded, and a setting is suggested that enhances data quality for fault
diagnosis.
( 2
min )
The outbreak of the COVID-19 pandemic revealed the criticality of timely
intervention in a situation exacerbated by a shortage in medical staff and
equipment. Pain-level screening is the initial step toward identifying the
severity of patient conditions. Automatic recognition of state and feelings
help in identifying patient symptoms to take immediate adequate action and
providing a patient-centric medical plan tailored to a patient's state. In this
paper, we propose a framework for pain-level detection for deployment in the
United Arab Emirates and assess its performance using the most used approaches
in the literature. Our results show that a deployment of a pain-level deep
learning detection framework is promising in identifying the pain level
accurately.
( 2
min )
Several approximate inference methods have been proposed for deep discrete
latent variable models. However, non-parametric methods which have previously
been successfully employed for classical sparse coding models have largely been
unexplored in the context of deep models. We propose a non-parametric iterative
algorithm for learning discrete latent representations in such deep models.
Additionally, to learn scale invariant discrete features, we propose local data
scaling variables. Lastly, to encourage sparsity in our representations, we
propose a Beta-Bernoulli process prior on the latent factors. We evaluate our
spare coding model coupled with different likelihood models. We evaluate our
method across datasets with varying characteristics and compare our results to
current amortized approximate inference methods.
( 2
min )
Hall effect thrusters are one of the most versatile and popular electric
propulsion systems for space use. Industry trends towards interplanetary
missions arise advances in design development of such propulsion systems. It is
understood that correct sizing of discharge channel in Hall effect thruster
impact performance greatly. Since the complete physics model of such propulsion
system is not yet optimized for fast computations and design iterations, most
thrusters are being designed using so-called scaling laws. But this work
focuses on rather novel approach, which is outlined less frequently than
ordinary scaling design approach in literature. Using deep machine learning it
is possible to create predictive performance model, which can be used to
effortlessly get design of required hall thruster with required characteristics
using way less computational power than design from scratch and way more
flexible than usual scaling approach.
( 2
min )
Our research deals with the optimization version of the set partition
problem, where the objective is to minimize the absolute difference between the
sums of the two disjoint partitions. Although this problem is known to be
NP-hard and requires exponential time to solve, we propose a less demanding
version of this problem where the goal is to find a locally optimal solution.
In our approach, we consider the local optimality in respect to any movement of
at most two elements. To accomplish this, we developed an algorithm that can
generate a locally optimal solution in at most $O(N^2)$ time and $O(N)$ space.
Our algorithm can handle arbitrary input precisions and does not require
positive or integer inputs. Hence, it can be applied in various problem
scenarios with ease.
( 2
min )
who's applying and what are you planning to build??? https://www.axios.com/2023/03/15/mozilla-responsible-ai-challenge
submitted by /u/joodfish
[link] [comments]
( 43
min )
Here are the samples. My favourite is this one! Which one is your favourite?
These samples are the product of a transformer (encoder) model trained on only 3 hours of music. Each sample is seeded by the first four bars of a real piece of music. These are the final samples before I completely overhaul the pre-training stage. The idea is to go from about 2-hours of midi to over 500 hours. I'm very excited to see how this effects the sample quality.
If anyone in interesting in following the project. Star the GitHub and follow me on Twitter.
submitted by /u/ustainbolt
[link] [comments]
( 43
min )
Baidu will unveil its conversational AI ERNIE Bot, powered by Baidu's in-house LLMs, on March 16. The ERNIE LLM was first proposed as a language understanding model in 2019 and evolved to ERNIE 3.0 Titan with 260 billion parameters.
ERNIE 1.0: https://arxiv.org/abs/1904.09223
ERNIE 2.0: https://arxiv.org/abs/1907.12412
ERNIE 3.0: https://arxiv.org/abs/2112.12731
ERNIE for text-to-image: https://arxiv.org/abs/2210.15257
ERNIE Bot live-stream on YouTube: https://www.youtube.com/watch?v=ukvEUI3x0vI
submitted by /u/kizumada
[link] [comments]
( 43
min )
submitted by /u/Hytsol
[link] [comments]
( 41
min )
submitted by /u/JaviFesser
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/Prunestand
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Salt-Entertainer3777
[link] [comments]
( 41
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/npsedhain
[link] [comments]
( 42
min )
submitted by /u/Peaking_AI
[link] [comments]
( 41
min )
submitted by /u/vjmde
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/arnolds112
[link] [comments]
( 41
min )
submitted by /u/jkterry1
[link] [comments]
( 41
min )
Hello everyone,
I'd like to show you a "working AlphaZero implementation that's simple enough to be able to understand what's going on at a quick glance, without sacrificing too much."
Link: https://github.com/scascin0/alphazero
submitted by /u/ayan0k0ji
[link] [comments]
( 41
min )
Global leader in convenient foods and beverages PepsiCo is deploying advanced machine vision technology from startup KoiReader Technologies, powered by the NVIDIA AI platform and GPUs, to improve efficiency and accuracy in its distribution process. PepsiCo has identified KoiReader’s technology as a solution to enable greater efficiency in reading warehouse labels. This AI-powered innovation helps Read article >
( 5
min )
It all started with two software engineers and a tomato farmer on a West Coast road trip. Visiting farms to survey their needs, the three hatched a plan at an apple orchard: build a highly adaptable 3D vision AI system for automating field tasks. Verdant, based in the San Francisco Bay Area, is developing AI Read article >
( 7
min )
Tens of thousands of AWS customers use AWS machine learning (ML) services to accelerate their ML development with fully managed infrastructure and tools. For customers who have been developing ML models on premises, such as their local desktop, they want to migrate their legacy ML models to the AWS Cloud to fully take advantage of […]
( 11
min )
Hey r/MachineLearning,
We are collecting a hand-crafted curated list of awesome curated lists closely related to machine learning.
Here is the link to the Github repo: https://github.com/zhimin-z/awesome-awesome-machine-learning
Do any lists need to be included from your perspective? Please let me know, or feel free to submit a pull request.
The motivation underlying this project is that so many awesome lists regarding machine learning exist on GitHub. But, gradually, it adds a mental burden to memorize where to look for when the ML world is progressing faster and faster these days.
Thus, there the project comes, as a unification to sew together all awesome lists closely related to machine learning.
submitted by /u/happybirdie007
[link] [comments]
( 43
min )
submitted by /u/SupPandaHugger
[link] [comments]
( 41
min )
submitted by /u/psprady
[link] [comments]
( 41
min )
submitted by /u/hottown
[link] [comments]
( 41
min )
submitted by /u/VausProd
[link] [comments]
( 41
min )
submitted by /u/Farnectarine4825
[link] [comments]
( 41
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
submitted by /u/Dalembert
[link] [comments]
( 42
min )
submitted by /u/Peaking_AI
[link] [comments]
( 41
min )
Learn how to create mind-blowing AI art with just a few keywords! This guide will show you how to use an AI model to generate stunning digital art, step by step!
https://youtu.be/HmrqjqyxeCo
submitted by /u/TheQuestionStation
[link] [comments]
( 41
min )
submitted by /u/Repeat-or
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/messyp
[link] [comments]
( 42
min )
Today, tens of thousands of customers are building, training, and deploying machine learning (ML) models using Amazon SageMaker to power applications that have the potential to reinvent their businesses and customer experiences. These ML models have been increasing in size and complexity over the last few years, which has led to state-of-the-art accuracies across a […]
( 9
min )
When I was getting my MBA at the University of Iowa in 1981, my advisor Gary Fethke (who would later serve as University of Iowa interim president and Emeritus Professor in Business Analytics) convinced me to take a PhD class in econometrics. I think he was trying to punish me or something. I was totally… Read More »Future of Education: Application not Regurgitation of Knowledge – Part I
The post Future of Education: Application not Regurgitation of Knowledge – Part I appeared first on Data Science Central.
( 23
min )
As a part of my teaching for AI at the University of Oxford, I read a large number of books which are based on the maths of data science. Data Science and Machine Learning Mathematical and Statistical Methods is a book i recommend if you like the maths of data science. There is a pdf… Read More »Data Science and Machine Learning Mathematical and Statistical Methods
The post Data Science and Machine Learning Mathematical and Statistical Methods appeared first on Data Science Central.
( 20
min )
Announcements Our Revamped Submission Guidelines Since our migration to WordPress, we have been looking to solidify a set of guidelines for writers to look at prior to submitting that will give them a rough idea of the quality standards the editors are looking for. Many of you will be familiar with our Tips and Tricks… Read More »DSC Weekly 14 March 2023 – Our Revamped Submission Guidelines
The post DSC Weekly 14 March 2023 – Our Revamped Submission Guidelines appeared first on Data Science Central.
( 20
min )
Paper - https://arxiv.org/abs/2303.05398
submitted by /u/MysteryInc152
[link] [comments]
( 45
min )
submitted by /u/MasterBin-IIAU
[link] [comments]
( 45
min )
Researchers used machine learning to build faster and more efficient hash functions, which are a key component of databases.
( 10
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 42
min )
This post is co-written with Mahima Agarwal, Machine Learning Engineer, and Deepak Mettem, Senior Engineering Manager, at VMware Carbon Black VMware Carbon Black is a renowned security solution offering protection against the full spectrum of modern cyberattacks. With terabytes of data generated by the product, the security analytics team focuses on building machine learning (ML) […]
( 11
min )
Amazon SageMaker Ground Truth Plus is a managed data labeling service that makes it easy to label data for machine learning (ML) applications. One common use case is semantic segmentation, which is a computer vision ML technique that involves assigning class labels to individual pixels in an image. For example, in video frames captured by […]
( 7
min )
(Image Source) Remote work has skyrocketed in the last three years. And with that comes increased productivity, happier employees, and lower overhead costs. But unfortunately, it’s not all sunshine and rainbows for companies with remote teams. Studies show that employees working from home increase the frequency of cyberattacks by 238%. And with the global average… Read More »How to Implement a Data Privacy and Protection Strategy for Remote Teams
The post How to Implement a Data Privacy and Protection Strategy for Remote Teams appeared first on Data Science Central.
( 23
min )
submitted by /u/gwern
[link] [comments]
( 41
min )
We introduce weak barycenters of a family of probability distributions, based
on the recently developed notion of optimal weak transport of mass by Gozlanet
al. (2017) and Backhoff-Veraguas et al. (2020). We provide a theoretical
analysis of this object and discuss its interpretation in the light of convex
ordering between probability measures. In particular, we show that, rather than
averaging the input distributions in a geometric way (as the Wasserstein
barycenter based on classic optimal transport does) weak barycenters extract
common geometric information shared by all the input distributions, encoded as
a latent random variable that underlies all of them. We also provide an
iterative algorithm to compute a weak barycenter for a finite family of input
distributions, and a stochastic algorithm that computes them for arbitrary
populations of laws. The latter approach is particularly well suited for the
streaming setting, i.e., when distributions are observed sequentially. The
notion of weak barycenter and our approaches to compute it are illustrated on
synthetic examples, validated on 2D real-world data and compared to standard
Wasserstein barycenters.
( 2
min )
With the development of hardware accelerators and their corresponding tools,
evaluations have become more affordable through fast and massively parallel
evaluations in some applications. This advancement has drastically sped up the
runtime of evolution-inspired algorithms such as Quality-Diversity
optimization, creating tremendous potential for algorithmic innovation through
scale. In this work, we propose MAP-Elites-Multi-ES (MEMES), a novel QD
algorithm based on Evolution Strategies (ES) designed for fast parallel
evaluations. ME-Multi-ES builds on top of the existing MAP-Elites-ES algorithm,
scaling it by maintaining multiple independent ES threads with massive
parallelization. We also introduce a new dynamic reset procedure for the
lifespan of the independent ES to autonomously maximize the improvement of the
QD population. We show experimentally that MEMES outperforms existing
gradient-based and objective-agnostic QD algorithms when compared in terms of
generations. We perform this comparison on both black-box optimization and
QD-Reinforcement Learning tasks, demonstrating the benefit of our approach
across different problems and domains. Finally, we also find that our approach
intrinsically enables optimization of fitness locally around a niche, a
phenomenon not observed in other QD algorithms.
( 2
min )
This tutorial introduces the CMA Evolution Strategy (ES), where CMA stands
for Covariance Matrix Adaptation. The CMA-ES is a stochastic, or randomized,
method for real-parameter (continuous domain) optimization of non-linear,
non-convex functions. We try to motivate and derive the algorithm from
intuitive concepts and from requirements of non-linear, non-convex search in
continuous domain.
( 2
min )
The use of unlicensed spectrum for cellular systems to mitigate spectrum
scarcity has led to the development of intelligent adaptive approaches to
spectrum access that improve upon traditional carrier sensing and
listen-before-talk methods. We study decentralized contention-based medium
access for base stations (BSs) of a single Radio Access Technology (RAT)
operating on unlicensed shared spectrum. We devise a distributed deep
reinforcement learning-based algorithm for both contention and adaptive
modulation, modelled on a two state Markov decision process, that attempts to
maximize a network-wide downlink throughput objective. Empirically, we find the
(proportional fairness) reward accumulated by a policy gradient approach to be
significantly higher than even a genie-aided adaptive energy detection
threshold. Our approaches are further validated by improved sum and peak
throughput. The scalability of our approach to large networks is demonstrated
via an improved cumulative reward earned on both indoor and outdoor layouts
with a large number of BSs.
( 2
min )
It is common to utilise dynamic models to measure the tyre-road friction in
real-time. Alternatively, predictive approaches estimate the tyre-road friction
by identifying the environmental factors affecting it. This work aims to
formulate the problem of friction estimation as a visual perceptual learning
task. The problem is broken down into detecting surface characteristics by
applying semantic segmentation and using the extracted features to predict the
frictional force. This work for the first time formulates the friction
estimation problem as a regression from the latent space of a semantic
segmentation model. The preliminary results indicate that this approach can
estimate frictional force.
( 2
min )
In this case study we trained and published a state-of-the-art open-source
model for Automatic Speech Recognition (ASR) for German to evaluate the current
potential of this technology for the use in the larger context of Digital
Humanities and cultural heritage indexation. Along with this paper we publish
our wav2vec2 based speech to text model while we evaluate its performance on a
corpus of historical recordings we assembled compared against commercial
cloud-based and proprietary services. While our model achieves moderate
results, we see that proprietary cloud services fare significantly better. As
our results show, recognition rates over 90 percent can currently be achieved,
however, these numbers drop quickly once the recordings feature limited audio
quality or use of non-every day or outworn language. A big issue is the high
variety of different dialects and accents in the German language. Nevertheless,
this paper highlights that the currently available quality of recognition is
high enough to address various use cases in the Digital Humanities. We argue
that ASR will become a key technology for the documentation and analysis of
audio-visual sources and identify an array of important questions that the DH
community and cultural heritage stakeholders will have to address in the near
future.
( 2
min )
General robotic grippers are challenging to control because of their rich
nonsmooth contact dynamics and the many sources of uncertainties due to the
environment or sensor noise. In this work, we demonstrate how to compute 6-DoF
grasp poses using simulation-based Bayesian inference through the full
stochastic forward simulation of the robot in its environment while robustly
accounting for many of the uncertainties in the system. A Riemannian manifold
optimization procedure preserving the nonlinearity of the rotation space is
used to compute the maximum a posteriori grasp pose. Simulation and physical
benchmarks show the promising high success rate of the approach.
( 2
min )
When dealing with electro or magnetoencephalography records, many supervised
prediction tasks are solved by working with covariance matrices to summarize
the signals. Learning with these matrices requires using Riemanian geometry to
account for their structure. In this paper, we propose a new method to deal
with distributions of covariance matrices and demonstrate its computational
efficiency on M/EEG multivariate time series. More specifically, we define a
Sliced-Wasserstein distance between measures of symmetric positive definite
matrices that comes with strong theoretical guarantees. Then, we take advantage
of its properties and kernel methods to apply this distance to brain-age
prediction from MEG data and compare it to state-of-the-art algorithms based on
Riemannian geometry. Finally, we show that it is an efficient surrogate to the
Wasserstein distance in domain adaptation for Brain Computer Interface
applications.
( 2
min )
An efficient deep learning model that can be implemented in real-time for
polyp detection is crucial to reducing polyp miss-rate during screening
procedures. Convolutional neural networks (CNNs) are vulnerable to small
changes in the input image. A CNN-based model may miss the same polyp appearing
in a series of consecutive frames and produce unsubtle detection output due to
changes in camera pose, lighting condition, light reflection, etc. In this
study, we attempt to tackle this problem by integrating temporal information
among neighboring frames. We propose an efficient feature concatenation method
for a CNN-based encoder-decoder model without adding complexity to the model.
The proposed method incorporates extracted feature maps of previous frames to
detect polyps in the current frame. The experimental results demonstrate that
the proposed method of feature concatenation improves the overall performance
of automatic polyp detection in videos. The following results are obtained on a
public video dataset: sensitivity 90.94\%, precision 90.53\%, and specificity
92.46%
( 2
min )
Accuracy validation of cortical thickness measurement is a difficult problem
due to the lack of ground truth data. To address this need, many methods have
been developed to synthetically induce gray matter (GM) atrophy in an MRI via
deformable registration, creating a set of images with known changes in
cortical thickness. However, these methods often cause blurring in atrophied
regions, and cannot simulate realistic atrophy within deep sulci where
cerebrospinal fluid (CSF) is obscured or absent. In this paper, we present a
solution using a self-supervised inpainting model to generate CSF in these
regions and create images with more plausible GM/CSF boundaries. Specifically,
we introduce a novel, 3D GAN model that incorporates patch-based dropout
training, edge map priors, and sinusoidal positional encoding, all of which are
established methods previously limited to 2D domains. We show that our
framework significantly improves the quality of the resulting synthetic images
and is adaptable to unseen data with fine-tuning. We also demonstrate that our
resulting dataset can be employed for accuracy validation of cortical
segmentation and thickness measurement.
( 2
min )
We provide an example of a distribution preserving source separation method,
which aims at addressing perceptual shortcomings of state-of-the-art methods.
Our approach uses unconditioned generative models of signal sources.
Reconstruction is achieved by means of mix-consistent sampling from a
distribution conditioned on a realization of a mix. The separated signals
follow their respective source distributions, which provides an advantage when
separation results are evaluated in a listening test.
( 2
min )
3D human mesh recovery from a 2D pose plays an important role in various
applications. However, it is hard for existing methods to simultaneously
capture the multiple relations during the evolution from skeleton to mesh,
including joint-joint, joint-vertex and vertex-vertex relations, which often
leads to implausible results. To address this issue, we propose a novel
solution, called GATOR, that contains an encoder of Graph-Aware Transformer
(GAT) and a decoder with Motion-Disentangled Regression (MDR) to explore these
multiple relations. Specifically, GAT combines a GCN and a graph-aware
self-attention in parallel to capture physical and hidden joint-joint
relations. Furthermore, MDR models joint-vertex and vertex-vertex interactions
to explore joint and vertex relations. Based on the clustering characteristics
of vertex offset fields, MDR regresses the vertices by composing the predicted
base motions. Extensive experiments show that GATOR achieves state-of-the-art
performance on two challenging benchmarks.
( 2
min )
Modelling dynamical systems is an integral component for understanding the
natural world. To this end, neural networks are becoming an increasingly
popular candidate owing to their ability to learn complex functions from large
amounts of data. Despite this recent progress, there has not been an adequate
discussion on the architectural regularization that neural networks offer when
learning such systems, hindering their efficient usage. In this paper, we
initiate a discussion in this direction using coordinate networks as a test
bed. We interpret dynamical systems and coordinate networks from a signal
processing lens, and show that simple coordinate networks with few layers can
be used to solve multiple problems in modelling dynamical systems, without any
explicit regularizers.
( 2
min )
Agglomerative hierarchical clustering based on Ordered Weighted Averaging
(OWA) operators not only generalises the single, complete, and average
linkages, but also includes intercluster distances based on a few nearest or
farthest neighbours, trimmed and winsorised means of pairwise point
similarities, amongst many others. We explore the relationships between the
famous Lance-Williams update formula and the extended OWA-based linkages with
weights generated via infinite coefficient sequences. Furthermore, we provide
some conditions for the weight generators to guarantee the resulting
dendrograms to be free from unaesthetic inversions.
( 2
min )
We propose a new 6-DoF grasp pose synthesis approach from 2D/2.5D input based
on keypoints. Keypoint-based grasp detector from image input has demonstrated
promising results in the previous study, where the additional visual
information provided by color images compensates for the noisy depth
perception. However, it relies heavily on accurately predicting the location of
keypoints in the image space. In this paper, we devise a new grasp generation
network that reduces the dependency on precise keypoint estimation. Given an
RGB-D input, our network estimates both the grasp pose from keypoint detection
as well as scale towards the camera. We further re-design the keypoint output
space in order to mitigate the negative impact of keypoint prediction noise to
Perspective-n-Point (PnP) algorithm. Experiments show that the proposed method
outperforms the baseline by a large margin, validating the efficacy of our
approach. Finally, despite trained on simple synthetic objects, our method
demonstrate sim-to-real capacity by showing competitive results in real-world
robot experiments.
( 2
min )
Despite the impressive performance of vision-based pose estimators, they
generally fail to perform well under adverse vision conditions and often don't
satisfy the privacy demands of customers. As a result, researchers have begun
to study tactile sensing systems as an alternative. However, these systems
suffer from noisy and ambiguous recordings. To tackle this problem, we propose
a novel solution for pose estimation from ambiguous pressure data. Our method
comprises a spatio-temporal vision transformer with an encoder-decoder
architecture. Detailed experiments on two popular public datasets reveal that
our model outperforms existing solutions in the area. Moreover, we observe that
increasing the number of temporal crops in the early stages of the network
positively impacts the performance while pre-training the network in a
self-supervised setting using a masked auto-encoder approach also further
improves the results.
( 2
min )
Rainfall data collected by various remote sensing instruments such as radars
or satellites has different space-time resolutions. This study aims to improve
the temporal resolution of radar rainfall products to help with more accurate
climate change modeling and studies. In this direction, we introduce a solution
based on EfficientNetV2, namely EfficientTempNet, to increase the temporal
resolution of radar-based rainfall products from 10 minutes to 5 minutes. We
tested EfficientRainNet over a dataset for the state of Iowa, US, and compared
its performance to three different baselines to show that EfficientTempNet
presents a viable option for better climate change monitoring.
( 2
min )
Tensor decomposition is now being used for data analysis, information
compression, and knowledge recovery. However, the mathematical property of
tensor decomposition is not yet fully clarified because it is one of singular
learning machines. In this paper, we give the upper bound of its real log
canonical threshold (RLCT) of the tensor decomposition by using an algebraic
geometrical method and derive its Bayesian generalization error theoretically.
We also give considerations about its mathematical property through numerical
experiments.
( 2
min )
Automatic Speech Recognition (ASR) in medical contexts has the potential to
save time, cut costs, increase report accuracy, and reduce physician burnout.
However, the healthcare industry has been slower to adopt this technology, in
part due to the importance of avoiding medically-relevant transcription
mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR
metric that penalizes clinically-relevant mistakes more than others. We
demonstrate that this metric more closely aligns with clinician preferences on
medical sentences as compared to other metrics (WER, BLUE, METEOR, etc),
sometimes by wide margins. We collect a benchmark of 13 clinician preferences
on 149 realistic medical sentences called the Clinician Transcript Preference
benchmark (CTP), demonstrate that CBERTScore more closely matches what
clinicians prefer, and release the benchmark for the community to further
develop clinically-aware ASR metrics.
( 2
min )
Classical multidimensional scaling (CMDS) is a technique that aims to embed a
set of objects in a Euclidean space given their pairwise Euclidean distance
matrix. The main part of CMDS is based on double centering a squared distance
matrix and employing a truncated eigendecomposition to recover the point
coordinates. A central result in CMDS connects the squared Euclidean matrix to
a Gram matrix derived from the set of points. In this paper, we study a dual
basis approach to classical multidimensional scaling. We give an explicit
formula for the dual basis and fully characterize the spectrum of an essential
matrix in the dual basis framework. We make connections to a related problem in
metric nearness.
( 2
min )
Unfolding networks have shown promising results in the Compressed Sensing
(CS) field. Yet, the investigation of their generalization ability is still in
its infancy. In this paper, we perform generalization analysis of a
state-of-the-art ADMM-based unfolding network, which jointly learns a decoder
for CS and a sparsifying redundant analysis operator. To this end, we first
impose a structural constraint on the learnable sparsifier, which parametrizes
the network's hypothesis class. For the latter, we estimate its Rademacher
complexity. With this estimate in hand, we deliver generalization error bounds
for the examined network. Finally, the validity of our theory is assessed and
numerical comparisons to a state-of-the-art unfolding network are made, on
synthetic and real-world datasets. Our experimental results demonstrate that
our proposed framework complies with our theoretical findings and outperforms
the baseline, consistently for all datasets.
( 2
min )
In recent years, knowledge distillation has become a cornerstone of
efficiently deployed machine learning, with labs and industries using knowledge
distillation to train models that are inexpensive and resource-optimized.
Trojan attacks have contemporaneously gained significant prominence, revealing
fundamental vulnerabilities in deep learning models. Given the widespread use
of knowledge distillation, in this work we seek to exploit the unlabelled data
knowledge distillation process to embed Trojans in a student model without
introducing conspicuous behavior in the teacher. We ultimately devise a Trojan
attack that effectively reduces student accuracy, does not alter teacher
performance, and is efficiently constructible in practice.
( 2
min )
We introduce weak barycenters of a family of probability distributions, based
on the recently developed notion of optimal weak transport of mass by Gozlanet
al. (2017) and Backhoff-Veraguas et al. (2020). We provide a theoretical
analysis of this object and discuss its interpretation in the light of convex
ordering between probability measures. In particular, we show that, rather than
averaging the input distributions in a geometric way (as the Wasserstein
barycenter based on classic optimal transport does) weak barycenters extract
common geometric information shared by all the input distributions, encoded as
a latent random variable that underlies all of them. We also provide an
iterative algorithm to compute a weak barycenter for a finite family of input
distributions, and a stochastic algorithm that computes them for arbitrary
populations of laws. The latter approach is particularly well suited for the
streaming setting, i.e., when distributions are observed sequentially. The
notion of weak barycenter and our approaches to compute it are illustrated on
synthetic examples, validated on 2D real-world data and compared to standard
Wasserstein barycenters.
( 2
min )
The estimation of probability density functions is a non trivial task that
over the last years has been tackled with machine learning techniques.
Successful applications can be obtained using models inspired by the Boltzmann
machine (BM) architecture. In this manuscript, the product Jacobi-Theta
Boltzmann machine (pJTBM) is introduced as a restricted version of the
Riemann-Theta Boltzmann machine (RTBM) with diagonal hidden sector connection
matrix. We show that score matching, based on the Fisher divergence, can be
used to fit probability densities with the pJTBM more efficiently than with the
original RTBM.
( 2
min )
Tensor decomposition is now being used for data analysis, information
compression, and knowledge recovery. However, the mathematical property of
tensor decomposition is not yet fully clarified because it is one of singular
learning machines. In this paper, we give the upper bound of its real log
canonical threshold (RLCT) of the tensor decomposition by using an algebraic
geometrical method and derive its Bayesian generalization error theoretically.
We also give considerations about its mathematical property through numerical
experiments.
( 2
min )
submitted by /u/actmademewannakms
[link] [comments]
( 43
min )
submitted by /u/Amazing_Painter_7692
[link] [comments]
( 44
min )
submitted by /u/fchung
[link] [comments]
( 46
min )
I put together this plain pytorch implementation of LLaMA (i just substituted the fairscale layers with the native ones and converted the weights accordingly) that can be more easily run in different environments.
The big problem with the official implementation is that in order to run the 65B version you need 8 GPUs no matter what, and to run the 30B version you need 4 and so on. In reality you can easily fit the 65B version in 2 A100 with 100G of VRAM.
vanilla-llama solves this problem. You just need to have enough memory and the model will be load in all the available GPUs.
https://github.com/galatolofederico/vanilla-llama
submitted by /u/poppear
[link] [comments]
( 43
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/oreosqueen6
[link] [comments]
( 41
min )
submitted by /u/Illustrious-Sign3015
[link] [comments]
( 41
min )
we take a closer look at Aicolumns - an online platform dedicated to artificial intelligence. Discover the latest AI tools, trends, and insights from a team of expert writers. Whether you're a seasoned AI professional or just starting out, aicolumns.com is your ultimate guide to all things AI.
https://youtu.be/927XESjV3kg
submitted by /u/Bassissou23
[link] [comments]
( 41
min )
submitted by /u/Wireless_Life
[link] [comments]
( 41
min )
submitted by /u/barrese87
[link] [comments]
( 41
min )
submitted by /u/tottocotunio
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 41
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/MusabShakeel
[link] [comments]
( 41
min )
submitted by /u/RobotArtificial
[link] [comments]
( 41
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
submitted by /u/SuspiciousPillbox
[link] [comments]
( 41
min )
submitted by /u/keghn
[link] [comments]
( 41
min )
submitted by /u/XiaolongWang
[link] [comments]
( 43
min )
https://github.com/jacobgil/confidenceinterval
pip install confidenceinterval
tldr: You don't have an excuse anymore to not use confidence intervals !
In statistics, confidence intervals are commonly reported along accuracy metrics to help interpret them.
For example, an AUC metric might be 0.9 but if the 95% confidence interval is in the range [0.7, 0.96], we can't confidently say we didn't just get lucky - we should be really careful making decisions around that result.
More formally, a confidence interval gives us a range on where the true unknown accuracy metric could be, and a 95% confidence interval means that if we would repeat the experiment many times, 95% of the confidence-intervals we reported would have the actual true metric (which is unknown) inside them - coverage.
…
( 45
min )
submitted by /u/Simusid
[link] [comments]
( 50
min )
submitted by /u/Soft-Material3294
[link] [comments]
( 43
min )
submitted by /u/madredditscientist
[link] [comments]
( 45
min )
Decompose Python libraries and generate Coherent hierarchical topic models of the repository.
https://github.com/danielpatrickhug/GitModel
The ability to bootstrap its own codebase is a powerful feature as it allows for efficient self-improvement and expansion. It means that the codebase is designed in such a way that it can use its own output as an input to improve itself. In the context of GitModel, this feature allows for the efficient improvement and expansion of its own codebase. By using its own output to generate hierarchical topic trees of GitHub repositories, it can analyze and extract insights from its own codebase and other codebases to improve its functionality. This can lead to more efficient and effective code generation, better semantic graph generation, and improved text generation capabilities.
I spent around 10 hours today on a major refactor creating a simple pipeline abstraction and allowing dynamic instantiation from yaml configs. It now also supports multiple GNN heads.
Please try it out and let me know what you think!
Example:
https://github.com/deepmind/clrs
https://preview.redd.it/ut4fc6c401na1.png?width=1506&format=png&auto=webp&s=d757356424b933cfa039cd922e27ec85bdffe0d4
submitted by /u/NovelspaceOnly
[link] [comments]
( 48
min )
submitted by /u/RobotArtificial
[link] [comments]
( 41
min )
submitted by /u/barrese87
[link] [comments]
( 41
min )
submitted by /u/RobotArtificial
[link] [comments]
( 41
min )
submitted by /u/merino_london16
[link] [comments]
( 42
min )
Midjourney seems to consistently have the best results. Have had very mixed results with Stable Diffusion, Lexica, and others like OpenJourney.
What model is closest to Midjourney's results but is open source &/or has an API?
submitted by /u/sideprojects_ai
[link] [comments]
( 41
min )
submitted by /u/RobotArtificial
[link] [comments]
( 41
min )
submitted by /u/SupPandaHugger
[link] [comments]
( 41
min )
submitted by /u/RobotArtificial
[link] [comments]
( 41
min )
submitted by /u/barrese87
[link] [comments]
( 41
min )
submitted by /u/csansoon
[link] [comments]
( 41
min )
submitted by /u/henlo_there_fren
[link] [comments]
( 41
min )
submitted by /u/LincolnOsiris_
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
AI Weirdness: the strange side of machine learning
( 2
min )
submitted by /u/keghn
[link] [comments]
( 41
min )
submitted by /u/keghn
[link] [comments]
( 41
min )
We study a heterogeneous agent macroeconomic model with an infinite number of
households and firms competing in a labor market. Each household earns income
and engages in consumption at each time step while aiming to maximize a concave
utility subject to the underlying market conditions. The households aim to find
the optimal saving strategy that maximizes their discounted cumulative utility
given the market condition, while the firms determine the market conditions
through maximizing corporate profit based on the household population behavior.
The model captures a wide range of applications in macroeconomic studies, and
we propose a data-driven reinforcement learning framework that finds the
regularized competitive equilibrium of the model. The proposed algorithm enjoys
theoretical guarantees in converging to the equilibrium of the market at a
sub-linear rate.
( 2
min )
Bayesian Causal Forests (BCF) is a causal inference machine learning model
based on a highly flexible non-parametric regression and classification tool
called Bayesian Additive Regression Trees (BART). Motivated by data from the
Trends in International Mathematics and Science Study (TIMSS), which includes
data on student achievement in both mathematics and science, we present a
multivariate extension of the BCF algorithm. With the help of simulation
studies we show that our approach can accurately estimate causal effects for
multiple outcomes subject to the same treatment. We also apply our model to
Irish data from TIMSS 2019. Our findings reveal the positive effects of having
access to a study desk at home (Mathematics ATE 95% CI: [0.20, 11.67]) while
also highlighting the negative consequences of students often feeling hungry at
school (Mathematics ATE 95% CI: [-11.15, -2.78] , Science ATE 95% CI:
[-10.82,-1.72]) or often being absent (Mathematics ATE 95% CI: [-12.47,
-1.55]).
( 2
min )
We introduce a class of networked Markov potential games where agents are
associated with nodes in a network. Each agent has its own local potential
function, and the reward of each agent depends only on the states and actions
of agents within a $\kappa$-hop neighborhood. In this context, we propose a
localized actor-critic algorithm. The algorithm is scalable since each agent
uses only local information and does not need access to the global state.
Further, the algorithm overcomes the curse of dimensionality through the use of
function approximation. Our main results provide finite-sample guarantees up to
a localization error and a function approximation error. Specifically, we
achieve an $\tilde{\mathcal{O}}(\epsilon^{-4})$ sample complexity measured by
the averaged Nash regret. This is the first finite-sample bound for multi-agent
competitive games that does not depend on the number of agents.
( 2
min )
A rigorous formalization of desired system requirements is indispensable when
performing any verification task. This often limits the application of
verification techniques, as writing formal specifications is an error-prone and
time-consuming manual task. To facilitate this, we present nl2spec, a framework
for applying Large Language Models (LLMs) to derive formal specifications (in
temporal logics) from unstructured natural language. In particular, we
introduce a new methodology to detect and resolve the inherent ambiguity of
system requirements in natural language: we utilize LLMs to map subformulas of
the formalization back to the corresponding natural language fragments of the
input. Users iteratively add, delete, and edit these sub-translations to amend
erroneous formalizations, which is easier than manually redrafting the entire
formalization. The framework is agnostic to specific application domains and
can be extended to similar specification languages and new neural models. We
perform a user study to obtain a challenging dataset, which we use to run
experiments on the quality of translations. We provide an open-source
implementation, including a web-based frontend.
( 2
min )
Blackwell's approachability is a very general sequential decision framework
where a Decision Maker obtains vector-valued outcomes, and aims at the
convergence of the average outcome to a given "target" set. Blackwell gave a
sufficient condition for the decision maker having a strategy guaranteeing such
a convergence against an adversarial environment, as well as what we now call
the Blackwell's algorithm, which then ensures convergence. Blackwell's
approachability has since been applied to numerous problems, in online learning
and game theory, in particular. We extend this framework by allowing the
outcome function and the dot product to be time-dependent. We establish a
general guarantee for the natural extension to this framework of Blackwell's
algorithm. In the case where the target set is an orthant, we present a family
of time-dependent dot products which yields different convergence speeds for
each coordinate of the average outcome. We apply this framework to the Big
Match (one of the most important toy examples of stochastic games) where an
$\epsilon$-uniformly optimal strategy for Player I is given by Blackwell's
algorithm in a well-chosen auxiliary approachability problem.
( 2
min )
Bayesian Causal Forests (BCF) is a causal inference machine learning model
based on a highly flexible non-parametric regression and classification tool
called Bayesian Additive Regression Trees (BART). Motivated by data from the
Trends in International Mathematics and Science Study (TIMSS), which includes
data on student achievement in both mathematics and science, we present a
multivariate extension of the BCF algorithm. With the help of simulation
studies we show that our approach can accurately estimate causal effects for
multiple outcomes subject to the same treatment. We also apply our model to
Irish data from TIMSS 2019. Our findings reveal the positive effects of having
access to a study desk at home (Mathematics ATE 95% CI: [0.20, 11.67]) while
also highlighting the negative consequences of students often feeling hungry at
school (Mathematics ATE 95% CI: [-11.15, -2.78] , Science ATE 95% CI:
[-10.82,-1.72]) or often being absent (Mathematics ATE 95% CI: [-12.47,
-1.55]).
( 2
min )
submitted by /u/dharambir_iitk
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/Zirius_Sadfaces
[link] [comments]
( 41
min )
submitted by /u/bukowski3000
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/Parth-Prajapati
[link] [comments]
( 43
min )
submitted by /u/catalinghita8
[link] [comments]
( 42
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/joelwohlhauser
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
Samples can be found here and here. See how they compare to the original chorales and fugues.
The model uses a Transformer encoder architecture to complete partially corrupted sequences representations of music. A version of Gibbs sampling is then used to construct new music from scratch. The entire model was trained in under 30 minutes on a single Tesla V100 - really showcasing the efficiency of Transformers in general.
Note that the fugue samples are seeded by the first three bars of an actual Bach fugue. The chorales are generated completely from scratch!
For more information on how it works - see the GitHub repo or follow me on Twitter.
submitted by /u/ustainbolt
[link] [comments]
( 43
min )
submitted by /u/blabboy
[link] [comments]
( 43
min )
I recently delved into the world of transformers and their application to vision tasks.
As part of my learning process, I implemented the Vision Transformer (ViT) from scratch using PyTorch. I am sharing my implementation and a step-by-step guide to implementing the model in this post.
I hope you find it helpful.
Github: https://github.com/tintn/vision-transformer-from-scratch
Post: https://medium.com/towards-data-science/implementing-vision-transformer-vit-from-scratch-3e192c6155f0
submitted by /u/Tin_Ng
[link] [comments]
( 43
min )
Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes in Amazon SageMaker Studio. Data Wrangler enables you to access data from a wide variety of popular sources (Amazon S3, Amazon Athena, Amazon Redshift, Amazon EMR and Snowflake) and over 40 other third-party sources. […]
( 10
min )
In this two-part series, we demonstrate how to label and train models for 3D object detection tasks. In part 1, we discuss the dataset we’re using, as well as any preprocessing steps, to understand and label data. In part 2, we walk through how to train a model on your dataset and deploy it to […]
( 13
min )
Online fraud has a widespread impact on businesses and requires an effective end-to-end strategy to detect and prevent new account fraud and account takeovers, and stop suspicious payment transactions. In this post, we show a serverless approach to detect online transaction fraud in near-real time. We show how you can apply this approach to various data streaming and event-driven architectures, depending on the desired outcome and actions to take to prevent fraud (such as alert the user about the fraud or flag the transaction for additional review).
( 7
min )
Aleksander Mądry urges lawmakers to ask rigorous questions about how AI tools are being used by corporations.
( 8
min )
The computer science and philosophy double-major aims to advance the field of AI ethics.
( 9
min )
submitted by /u/Dendrophile_guy
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/_utisz_
[link] [comments]
( 41
min )
submitted by /u/A_single_french_fry
[link] [comments]
( 41
min )
submitted by /u/tomd_96
[link] [comments]
( 41
min )
submitted by /u/harttrav
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/jsonathan
[link] [comments]
( 42
min )
submitted by /u/thejashGI
[link] [comments]
( 41
min )
submitted by /u/Kiizmod0
[link] [comments]
( 46
min )
The AI landscape is being reshaped by the rise of generative models capable of synthesizing high-quality data, such as text, images, music, and videos. The course toward democratization of AI helped to further popularize generative AI following the open-source releases for such foundation model families as BERT, T5, GPT, CLIP and, most recently, Stable Diffusion. […]
( 9
min )
As machine learning (ML) models have improved, data scientists, ML engineers and researchers have shifted more of their attention to defining and bettering data quality. This has led to the emergence of a data-centric approach to ML and various techniques to improve model performance by focusing on data requirements. Applying these techniques allows ML practitioners […]
( 9
min )
Aided by machine learning, scientists are working to develop a vaccine that would be effective against all SARS-Cov-2 strains.
( 10
min )
It’s a thrilling GFN Thursday with GRID Legends racing to the cloud this week. It leads a total of eight new games expanding the GeForce NOW library. New content for Rainbow Six Siege is also now streaming. Plus, two new cities are now online with GeForce RTX 4080 performance for cloud gaming. Chicago and Montreal Read article >
( 6
min )
Hi, I work at Intel as an academic outreach coordinator. I'm sharing about Intel's open source OpenVINO toolkit for optimizing and deploy AI inference on CPUs, discrete and integrated GPUs, and other accelerators like Movidius VPUs and Intel FPGA. The github has over 60 jupyter notebooks that can work on Intel PCs/laptop using Windows & Linux, or on Macs on MacOS including M1 processors.
Try out the stable diffusion Jupyter Notebook #225, or try out the vehicle recognition and detection Jupyter Notebook #218
Its easy to install in 9 simple steps on Windows with pip install, 8 steps on MacOS, and 7 steps on Linux.
submitted by /u/JayMBurris
[link] [comments]
( 43
min )
submitted by /u/israelavila
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
I had to do a couple of tries but I think overall the results are impressive. Here it is:
https://www.youtube.com/watch?v=LcrLopIoJeA&t=14s&ab_channel=Triviadetodo
submitted by /u/laburanta
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/Huguini
[link] [comments]
( 41
min )
submitted by /u/h_xiao
[link] [comments]
( 41
min )
submitted by /u/Discovensco
[link] [comments]
( 43
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 43
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 42
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
MIT researchers uncover the structural properties and dynamics of deep classifiers, offering novel explanations for optimization, generalization, and approximation in deep networks.
( 8
min )
Amazon SageMaker is a fully managed machine learning (ML) service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. Sagemaker provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so […]
( 10
min )
This post is co-authored with Hernan Figueroa, Sr. Manager Data Science at Marubeni Power International. Marubeni Power International Inc (MPII) owns and invests in power business platforms in the Americas. An important vertical for MPII is asset management for renewable energy and energy storage assets, which are critical to reduce the carbon intensity of our […]
( 10
min )
Reinforcement learning (RL) encompasses a class of machine learning (ML) techniques that can be used to solve sequential decision-making problems. RL techniques have found widespread applications in numerous domains, including financial services, autonomous navigation, industrial control, and e-commerce. The objective of an RL problem is to train an agent that, given an observation from its […]
( 11
min )
Dear community,
I have written a medium article showing my top 5 resources that have made me learn DRL fast, from zero (I was previously a researcher on Bayesian optimization) to research in these topics in this Medium article https://medium.com/@eduardogarrido90/you-can-do-it-top-5-resources-to-easily-learn-deep-reinforcement-learning-d0bdef295cc6 hope that you like it.
Best,
submitted by /u/EduCGM
[link] [comments]
( 42
min )
Hi all,
I've been uploading blog posts to Medium summarizing the content from Reinforcement Learning, 2nd Edition, along with code examples.
I remember when I was first learning about RL I wish someone had done this, so I've decided to do it hoping it might help anyone who's just getting started. I've summarized up to chapter 4 thus far and the posts can be found here: https://medium.com/@numsmt2
I plan on going through the entire book. Hope this helps!
submitted by /u/Common-Mushroom2333
[link] [comments]
( 41
min )
submitted by /u/mrx-ai
[link] [comments]
( 42
min )
The authors fastdup ran an analysis on LAION 400M and Imagenet21K. Here's what they found.
Analysing LAION
LAION 400M - TLDR video.
60M duplicates.
962K broken images.
Various label discrepancies.
ImageNet21K - Link to blog post.
1.2M duplicate images.
104K train/val leak.
GitHub repo - https://github.com/visual-layer/fastdup
submitted by /u/WatercressTraining
[link] [comments]
( 44
min )
The Academy Award nominations are in — and for the 15th year in a row, NVIDIA technologies worked behind the scenes of every film nominated for Best Visual Effects. The five VFX contenders for the 95th annual Academy Awards, taking place on Sunday, March 12, include: All Quiet on the Western Front Avatar: The Way Read article >
( 7
min )
An adrenaline-fueled virtual ride in the sky is sure to satisfy all thrill seekers — courtesy of 3D artist Kosei Wano’s sensational animation, Moon Hawk. Wano outlines his creative workflow this week In the NVIDIA Studio.
( 7
min )
Preparing a retailer’s online catalog once required expensive physical photoshoots to capture products from every angle. A Tel Aviv startup is saving brands time and money by transforming these camera clicks into mouse clicks. Hexa uses GPU-accelerated computing to help companies turn their online inventory into 3D renders that shoppers can view in 360 degrees, Read article >
( 6
min )
submitted by /u/Impressive_Hat9961
[link] [comments]
( 41
min )
submitted by /u/PuppetHere
[link] [comments]
( 41
min )
submitted by /u/sandropuppo
[link] [comments]
( 41
min )
submitted by /u/better__ideas
[link] [comments]
( 43
min )
submitted by /u/Mk_Makanaki
[link] [comments]
( 41
min )
submitted by /u/joerocca
[link] [comments]
( 43
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/HastyNationality
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
Announcements Repetitions of History: Can You Trust Your Eyes (or Ears)? We find ourselves in a similar conversation today that occurred in the 1880s when photography became widespread. Artists and critics derided photography because it lacked “that refined feeling and sentiment which animate the productions of a man of genius.” They believed photography lacked a… Read More »DSC Weekly 7 March 2023 – Repetitions of History: Can You Trust Your Eyes (or Ears)?
The post DSC Weekly 7 March 2023 – Repetitions of History: Can You Trust Your Eyes (or Ears)? appeared first on Data Science Central.
( 20
min )
Deploying models at scale can be a cumbersome task for many data scientists and machine learning engineers. However, Amazon SageMaker endpoints provide a simple solution for deploying and scaling your machine learning (ML) model inferences. Our last blog post and GitHub repo on hosting a YOLOv5 TensorFlowModel on Amazon SageMaker Endpoints sparked a lot of interest […]
( 7
min )
This post presents and compares options and recommended practices on how to manage Python packages and virtual environments in Amazon SageMaker Studio notebooks. A public GitHub repo provides hands-on examples for each of the presented approaches. Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for machine learning (ML) that lets you build, train, […]
( 14
min )
The Amazon International Seller Growth (ISG) team runs the CSBA (Customer Service by Amazon) program that supports over 200,000 third-party Merchant Fulfilled Network (MFN) sellers. Amazon call centers facilitate hundreds of thousands of phone calls, chats, and emails going between the consumers and Amazon MFN sellers. The large volume of contacts creates a challenge for […]
( 10
min )
Yammer is a social networking platform designed for open and dynamic communications and collaborations within organizations. It allows you to build communities of interest, gather ideas and feedback, and keep everyone informed. It’s available via browser or mobile app, and provides a variety of common social networking features such as private and public communities, news […]
( 8
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
submitted by /u/HastyNationality
[link] [comments]
( 41
min )
We consider a reinforcement learning setting in which the deployment
environment is different from the training environment. Applying a robust
Markov decision processes formulation, we extend the distributionally robust
$Q$-learning framework studied in Liu et al. [2022]. Further, we improve the
design and analysis of their multi-level Monte Carlo estimator. Assuming access
to a simulator, we prove that the worst-case expected sample complexity of our
algorithm to learn the optimal robust $Q$-function within an $\epsilon$ error
in the sup norm is upper bounded by $\tilde
O(|S||A|(1-\gamma)^{-5}\epsilon^{-2}p_{\wedge}^{-6}\delta^{-4})$, where
$\gamma$ is the discount rate, $p_{\wedge}$ is the non-zero minimal support
probability of the transition kernels and $\delta$ is the uncertainty size.
This is the first sample complexity result for the model-free robust RL
problem. Simulation studies further validate our theoretical results.
( 2
min )
Intent detection with semantically similar fine-grained intents is a
challenging task. To address it, we reformulate intent detection as a
question-answering retrieval task by treating utterances and intent names as
questions and answers. To that end, we utilize a question-answering retrieval
architecture and adopt a two stages training schema with batch contrastive
loss. In the pre-training stage, we improve query representations through
self-supervised training. Then, in the fine-tuning stage, we increase
contextualized token-level similarity scores between queries and answers from
the same intent. Our results on three few-shot intent detection benchmarks
achieve state-of-the-art performance.
( 2
min )
Recent studies indicate that deep learning plays a crucial role in the
automated visual inspection of road infrastructures. However, current learning
schemes are static, implying no dynamic adaptation to users' feedback. To
address this drawback, we present a few-shot learning paradigm for the
automated segmentation of road cracks, which is based on a U-Net architecture
with recurrent residual and attention modules (R2AU-Net). The retraining
strategy dynamically fine-tunes the weights of the U-Net as a few new rectified
samples are being fed into the classifier. Extensive experiments show that the
proposed few-shot R2AU-Net framework outperforms other state-of-the-art
networks in terms of Dice and IoU metrics, on a new dataset, named CrackMap,
which is made publicly available at https://github.com/ikatsamenis/CrackMap.
( 2
min )
This paper proposes a new GNN design strategy. This strategy relies on
Context-Free Grammars (CFG) generating the matrix language MATLANG. It enables
us to ensure both WL-expressive power, substructure counting abilities and
spectral properties. Applying our strategy, we design Grammatical Graph Neural
Network G$ ^2$N$^2$, a provably 3-WL GNN able to count at edge-level cycles of
length up to 6 and able to reach band-pass filters. A large number of
experiments covering these properties corroborate the presented theoretical
results.
( 2
min )
The problem of optimization on Stiefel manifold, i.e., minimizing functions
of (not necessarily square) matrices that satisfy orthogonality constraints,
has been extensively studied. Yet, a new approach is proposed based on, for the
first time, an interplay between thoughtfully designed continuous and discrete
dynamics. It leads to a gradient-based optimizer with intrinsically added
momentum. This method exactly preserves the manifold structure but does not
require additional operation to keep momentum in the changing (co)tangent
space, and thus has low computational cost and pleasant accuracy. Its
generalization to adaptive learning rates is also demonstrated. Notable
performances are observed in practical tasks. For instance, we found that
placing orthogonal constraints on attention heads of trained-from-scratch
Vision Transformer [Dosovitskiy et al. 2022] could markedly improve its
performance, when our optimizer is used, and it is better that each head is
made orthogonal within itself but not necessarily to other heads. This
optimizer also makes the useful notion of Projection Robust Wasserstein
Distance [Paty & Cuturi 2019; Lin et al. 2020] for high-dim. optimal transport
even more effective.
( 2
min )
We consider the problem of optimizing expensive black-box functions over
high-dimensional combinatorial spaces which arises in many science,
engineering, and ML applications. We use Bayesian Optimization (BO) and propose
a novel surrogate modeling approach for efficiently handling a large number of
binary and categorical parameters. The key idea is to select a number of
discrete structures from the input space (the dictionary) and use them to
define an ordinal embedding for high-dimensional combinatorial structures. This
allows us to use existing Gaussian process models for continuous spaces. We
develop a principled approach based on binary wavelets to construct
dictionaries for binary spaces, and propose a randomized construction method
that generalizes to categorical spaces. We provide theoretical justification to
support the effectiveness of the dictionary-based embeddings. Our experiments
on diverse real-world benchmarks demonstrate the effectiveness of our proposed
surrogate modeling approach over state-of-the-art BO methods.
( 2
min )
Motivated by a variety of applications, high-dimensional time series have
become an active topic of research. In particular, several methods and
finite-sample theories for individual stable autoregressive processes with
known lag have become available very recently. We, instead, consider multiple
stable autoregressive processes that share an unknown lag. We use information
across the different processes to simultaneously select the lag and estimate
the parameters. We prove that the estimated process is stable, and we establish
rates for the forecasting error that can outmatch the known rate in our
setting. Our insights on the lag selection and the stability are also of
interest for the case of individual autoregressive processes.
( 2
min )
Large-scale linear models are ubiquitous throughout machine learning, with
contemporary application as surrogate models for neural network uncertainty
quantification; that is, the linearised Laplace method. Alas, the computational
cost associated with Bayesian linear models constrains this method's
application to small networks, small output spaces and small datasets. We
address this limitation by introducing a scalable sample-based Bayesian
inference method for conjugate Gaussian multi-output linear models, together
with a matching method for hyperparameter (regularisation) selection.
Furthermore, we use a classic feature normalisation method (the g-prior) to
resolve a previously highlighted pathology of the linearised Laplace method.
Together, these contributions allow us to perform linearised neural network
inference with ResNet-18 on CIFAR100 (11M parameters, 100 output dimensions x
50k datapoints) and with a U-Net on a high-resolution tomographic
reconstruction task (2M parameters, 251k output dimensions).
( 2
min )
Hamiltonian mechanics is one of the cornerstones of natural sciences.
Recently there has been significant interest in learning Hamiltonian systems in
a free-form way directly from trajectory data. Previous methods have tackled
the problem of learning from many short, low-noise trajectories, but learning
from a small number of long, noisy trajectories, whilst accounting for model
uncertainty has not been addressed. In this work, we present a Gaussian process
model for Hamiltonian systems with efficient decoupled parameterisation, and
introduce an energy-conserving shooting method that allows robust inference
from both short and long trajectories. We demonstrate the method's success in
learning Hamiltonian systems in various data settings.
( 2
min )
The article considers semi-supervised multitask learning on a Gaussian
mixture model (GMM). Using methods from statistical physics, we compute the
asymptotic Bayes risk of each task in the regime of large datasets in high
dimension, from which we analyze the role of task similarity in learning and
evaluate the performance gain when tasks are learned together rather than
separately. In the supervised case, we derive a simple algorithm that attains
the Bayes optimal performance.
( 2
min )
In December last year, I've completed my MS in Data Science. My capstone project had to do with semantic segmentation of medical ultrasound images (TLDR: cancer detection). I used a transformer model based on SegFormer. After the project was completed, I tried to improve the model performance a bit more.
I was surprised by the IoU performance, which seemed a little too good to be true. I ended up writing my own metrics which calculated IoU, Dice, precision, and recall, among other things. My IoU results, computed with my own code, were consistently less than the IoU results I got from the library I was using at the time - the Evaluate library from Hugging Face. But their IoU was equal to what my code computed as recall (sensitivity). I've opened a ticket with Hugging Face:
https://github.com/huggingface/evaluate/issues/421
They basically said they had copied that whole code from OpenMMLab and I should take it up with them. So I did:
https://github.com/open-mmlab/mmsegmentation/issues/2655
That was more than a week ago and there's still no reply. Meanwhile I've seen other bug reports which appear to point at the same problem:
https://github.com/open-mmlab/mmsegmentation/issues/2594
I'm pretty sure I am right. The definition of IoU is quite simple, and there isn't much room there for interpretation. Their code fails simple test cases.
My concern is - since they effectively calculate recall instead of IoU, and recall is larger than, or equal to IoU, and since the MMSegmentation library is widely used in image segmentation research, it's possible there are quite a few results floating out there in the literature that are a few percentage points larger than what they should be - e.g. 90% IoU instead of 85%.
Thoughts?
submitted by /u/florinandrei
[link] [comments]
( 46
min )
Have anyone tried to optimize the forward and backward using custom Cuda code or fused kernel to speed up the training time of current LLMs? I only have seen FasterTransformer ( NVIDIA/FasterTransformer) and other similar tools but they're only focusing on inference.
submitted by /u/Pretend_Ad3180
[link] [comments]
( 43
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/JohnnyHercules
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/Zirius_Sadfaces
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/walt74
[link] [comments]
( 41
min )
submitted by /u/OnlyProggingForFun
[link] [comments]
( 41
min )
submitted by /u/arnolds112
[link] [comments]
( 41
min )
Language models are statistical methods predicting the succession of tokens in sequences, using natural text. Large language models (LLMs) are neural network-based language models with hundreds of millions (BERT) to over a trillion parameters (MiCS), and whose size makes single-GPU training impractical. LLMs’ generative abilities make them popular for text synthesis, summarization, machine translation, and […]
( 18
min )
Back in 2018, I had the privilege of keynoting at one of Semantic Web Company’s events in Vienna, as well as attending the full event. It was a great opportunity to immerse myself in the Central European perspective on the utility of Linked Open Data standards and how those standards were being applied. I got… Read More »FAIR Content: Better Chatbot Answers and Content Reusability at Scale
The post FAIR Content: Better Chatbot Answers and Content Reusability at Scale appeared first on Data Science Central.
( 21
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
submitted by /u/CeFurkan
[link] [comments]
( 41
min )
submitted by /u/geogamersking
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/SupPandaHugger
[link] [comments]
( 41
min )
submitted by /u/Less-Shirt5163
[link] [comments]
( 41
min )
submitted by /u/barrese87
[link] [comments]
( 41
min )
submitted by /u/PuppetHere
[link] [comments]
( 41
min )
submitted by /u/EIDANart
[link] [comments]
( 41
min )
submitted by /u/Wireless_Life
[link] [comments]
( 41
min )
submitted by /u/Lakshmireddys
[link] [comments]
( 41
min )
submitted by /u/vfra32
[link] [comments]
( 43
min )
submitted by /u/sracluv
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/jsonathan
[link] [comments]
( 44
min )
submitted by /u/No_Bath_562
[link] [comments]
( 43
min )
submitted by /u/radi-cho
[link] [comments]
( 46
min )
Hi everyone. I have tested RWKV [loss vs token position] for 10000 ctx4k+ documents in Pile:
https://preview.redd.it/3ld2629h6xla1.png?width=941&format=png&auto=webp&s=008cb5eab35b86c3d9dc2378b1b78bdc98f50120
RWKV 1B5-4k is mostly flat after ctx1500, but 3B-4k and 7B-4k and 14B-4k have some slopes, and they are getting better. This debunks the old view that RNNs cannot model long ctxlens. These ctx4096 models are available at https://huggingface.co/BlinkDL.
We can predict that RWKV 100B will be great, and RWKV 1T is probably all you need :)
https://preview.redd.it/e3tbivtx6xla1.png?width=1174&format=png&auto=webp&s=53767f2e857edd429223472c0b67ef9ca31f2aa5
RWKV is simple. You can read https://arxiv.org/abs/2302.13939 (SpikeGPT) which is inspired by RWKV and has plenty of explanations. …
( 47
min )
submitted by /u/keghn
[link] [comments]
( 41
min )
submitted by /u/gwern
[link] [comments]
( 41
min )
submitted by /u/TFW_YT
[link] [comments]
( 42
min )
submitted by /u/radi-cho
[link] [comments]
( 42
min )
https://github.com/danthelion/talksheet
A small project showcasing how to create a "self-serve" analytical application, powered by the wonderful Langchain and DuckDB.
There are a bunch of features (like supporting other file formats such as parquet and json) planned for the future, just wanted to ship something quickly.
submitted by /u/dan_the_lion
[link] [comments]
( 43
min )
submitted by /u/SpatialComputing
[link] [comments]
( 44
min )
submitted by /u/Pristine-Woodpecker
[link] [comments]
( 42
min )
submitted by /u/davidmezzetti
[link] [comments]
( 42
min )
submitted by /u/rumovoice
[link] [comments]
( 45
min )
submitted by /u/MysteryInc152
[link] [comments]
( 44
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/AnakinRagnarsson66
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/sediba-edud-eht
[link] [comments]
( 41
min )
submitted by /u/GodGivenRx
[link] [comments]
( 41
min )
submitted by /u/PM_ME_YOUR_REQUESTS
[link] [comments]
( 44
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
I have seen a lot of videos like this one which consist on Biden, Obama and Trump gaming together while they roast each other. Do you have any idea of what tool is used for this?
Thank you 🙏🏼
submitted by /u/ElonJuniorMusk
[link] [comments]
( 41
min )
submitted by /u/9999Karma
[link] [comments]
( 41
min )
submitted by /u/aizaz-zazii
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 42
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
Hi,
I was wondering, if there is an AI which can
Create slides from Images
Example: Screenshot from slide as input
I am not looking for sth like https://www.beautiful.ai/ , rather to create the elements in Google Slides which I could the arrange.
Thank you!
Example image
https://preview.redd.it/ca698mhgvqla1.png?width=1280&format=png&auto=webp&s=03a19d1df97855cb2d260f09b441a6fa8327a9ca
submitted by /u/rubicscube11
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 42
min )
submitted by /u/treyratcliff
[link] [comments]
( 41
min )
submitted by /u/Free_Yam_3287
[link] [comments]
( 41
min )
https://www.notabot.tech/subscribe?ref=iBUStIpICm
An AI newsletter made by Haroon Choudery. Keeps me up to date on all the juicy AI news! 🤖
Post Your Opinions!
submitted by /u/Muatangz
[link] [comments]
( 41
min )
submitted by /u/Linkology
[link] [comments]
( 41
min )
submitted by /u/much_successes
[link] [comments]
( 41
min )
submitted by /u/PuppetHere
[link] [comments]
( 41
min )
submitted by /u/tomd_96
[link] [comments]
( 41
min )
submitted by /u/nick313
[link] [comments]
( 41
min )
submitted by /u/RushingRobotics_com
[link] [comments]
( 41
min )
submitted by /u/TheInsaneApp
[link] [comments]
( 41
min )
submitted by /u/DenofBlerds
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/BackgroundResult
[link] [comments]
( 41
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 41
min )
There is a field of modelling called "Survival Analysis" (https://en.wikipedia.org/wiki/Survival_analysis), in which the objective is to model the effect of different "characteristics" (e.g. medical measurements of patients such as height, age, weight, etc.) on the "time of some event" (e.g. death). Many models used in Survival Analysis are essentially a form of "Regression Models" (https://en.wikipedia.org/wiki/Regression_analysis) - and of course, these models are built, train and fine tuned using some Optimization Algorithm (e.g. Newton-Raphson).
One of the most popular types of models used in Survival Analysis is called the "Cox Proportional-Hazards" Model (https://en.wikipedia.org/wiki/Proportional_hazards_model). As an example, here I have fit a Cox-PH Model to some dataset using th…
( 47
min )
All throughout the world, industrial processes are being increasingly redefined by IoT and AI. Smart energy grids, predictive maintenance sensors, and wearable gadgets like smartwatches and AR/VR goggles—IoT and AI have combined to unleash the potential of data quicker than ever. No sector of the economy is exempt from the advantages that IoT and AI… Read More »Power of AI Automation In Agritech: Everything You Need To Know For Your Business
The post Power of AI Automation In Agritech: Everything You Need To Know For Your Business appeared first on Data Science Central.
( 20
min )
Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides. Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should […]
( 7
min )
Machine learning (ML) can help companies make better business decisions through advanced analytics. Companies across industries apply ML to use cases such as predicting customer churn, demand forecasting, credit scoring, predicting late shipments, and improving manufacturing quality. In this blog post, we’ll look at how Amazon SageMaker Canvas delivers faster and more accurate model training times enabling […]
( 5
min )
MIT researchers trained logic-aware language models to reduce harmful stereotypes like gender and racial biases.
( 8
min )
The long-running programming competition encourages skills and friendships that last a lifetime.
( 11
min )
Here is our podcast episode with Sergey Levine from UC Berkeley where we discussed the evolution of deep reinforcement learning, how previous robotics approaches were replaced, and why offline RL is significant for future generalization.
submitted by /u/thejashGI
[link] [comments]
( 43
min )
submitted by /u/thejashGI
[link] [comments]
( 41
min )
submitted by /u/Alarming-Recipe2857
[link] [comments]
( 41
min )
The race toward sentient AI is on. A combination of hubris and competition between governments and societies akin to an arms race virtually ensures ‘sentient’ AI/AGI/ASI will be developed in relatively short order. There is increasing evidence such as the Othello Paper that is upending the auto-complete narrative already. LLMs having a world model implies theory of mind, and thus at least Functional Consciousness (albeit quantized for the time being) which likely in turn confers some form of partial non-anthropomorphic sentience, which will at some point open an ethical, societal, and religious Pandora’s box (see the Bodhisattva vow). The only thing we don’t know is just how far down this slippery slope we are at the moment. It’s also hard to argue against the runaway AI effect as well in …
( 43
min )
submitted by /u/aizaz-zazii
[link] [comments]
( 41
min )
submitted by /u/TraxDarkstorm
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/pospielov
[link] [comments]
( 41
min )
submitted by /u/dreamfi_617
[link] [comments]
( 41
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/cbsudux
[link] [comments]
( 41
min )
submitted by /u/dpierce94
[link] [comments]
( 41
min )
submitted by /u/timCrooks
[link] [comments]
( 41
min )
submitted by /u/DPC_1
[link] [comments]
( 41
min )
submitted by /u/arnolds112
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/HEAL3D
[link] [comments]
( 41
min )
Financial market participants are faced with an overload of information that influences their decisions, and sentiment analysis stands out as a useful tool to help separate out the relevant and meaningful facts and figures. However, the same piece of news can have a positive or negative impact on stock prices, which presents a challenge for […]
( 14
min )
Amazon Kendra is an easy-to-use intelligent search service that allows you to integrate search capabilities with your applications so users can find information stored across data sources like Amazon Simple Storage Service , OneDrive and Google Drive; applications such as SalesForce, SharePoint and Service Now; and relational databases like Amazon Relational Database Service (Amazon RDS). Using […]
( 9
min )
March is already here and a new month always means new games, with a total of 19 joining the GeForce NOW library. Set off on a magical journey to restore Disney magic when Disney Dreamlight Valley joins the cloud later this month. Plus, the hunt is on with Capcom’s Monster Hunter Rise now available for Read article >
( 6
min )
These days, everyone is excited about Metaverse. The hype that Metaverse created over the few years is exceptional. Metaverse will give a whole new gaming experience to its users. In Metaverse, an immersive virtual world is created, in which users can play in a real-world setting with special effects with the help of VR and… Read More »Metaverse in Gaming: Revolution In Gaming industry With Next-Generation Experience
The post Metaverse in Gaming: Revolution In Gaming industry With Next-Generation Experience appeared first on Data Science Central.
( 23
min )
A process that seeks feedback from human specialists proves more effective at optimization than automated systems working alone.
( 9
min )
Appendicitis is among the most frequent reasons for pediatric abdominal
surgeries. With recent advances in machine learning, data-driven decision
support could help clinicians diagnose and manage patients while reducing the
number of non-critical surgeries. Previous decision support systems for
appendicitis focused on clinical, laboratory, scoring and computed tomography
data, mainly ignoring abdominal ultrasound, a noninvasive and readily available
diagnostic modality. To this end, we developed and validated interpretable
machine learning models for predicting the diagnosis, management and severity
of suspected appendicitis using ultrasound images. Our models were trained on a
dataset comprising 579 pediatric patients with 1709 ultrasound images
accompanied by clinical and laboratory data. Our methodological contribution is
the generalization of concept bottleneck models to prediction problems with
multiple views and incomplete concept sets. Notably, such models lend
themselves to interpretation and interaction via high-level concepts
understandable to clinicians without sacrificing performance or requiring
time-consuming image annotation when deployed.
( 2
min )
In computer vision, it is often observed that formulating regression problems
as a classification task often yields better performance. We investigate this
curious phenomenon and provide a derivation to show that classification, with
the cross-entropy loss, outperforms regression with a mean squared error loss
in its ability to learn high-entropy feature representations. Based on the
analysis, we propose an ordinal entropy loss to encourage higher-entropy
feature spaces while maintaining ordinal relationships to improve the
performance of regression tasks. Experiments on synthetic and real-world
regression tasks demonstrate the importance and benefits of increasing entropy
for regression.
( 2
min )
We propose a new high-performance activation function, Moderate Adaptive
Linear Units (MoLU), for the deep neural network. The MoLU is a simple,
beautiful and powerful activation function that can be a good main activation
function among hundreds of activation functions. Because the MoLU is made up of
the elementary functions, not only it is a infinite diffeomorphism (i.e. smooth
and infinitely differentiable over whole domains), but also it decreases
training time.
( 2
min )
Proximal policy optimization and trust region policy optimization (PPO and
TRPO) with actor and critic parametrized by neural networks achieve significant
empirical success in deep reinforcement learning. However, due to nonconvexity,
the global convergence of PPO and TRPO remains less understood, which separates
theory from practice. In this paper, we prove that a variant of PPO and TRPO
equipped with overparametrized neural networks converges to the globally
optimal policy at a sublinear rate. The key to our analysis is the global
convergence of infinite-dimensional mirror descent under a notion of one-point
monotonicity, where the gradient and iterate are instantiated by neural
networks. In particular, the desirable representation power and optimization
geometry induced by the overparametrization of such neural networks allow them
to accurately approximate the infinite-dimensional gradient and iterate.
( 2
min )
Recently, score-based generative models have been successfully employed for
the task of speech enhancement. A stochastic differential equation is used to
model the iterative forward process, where at each step environmental noise and
white Gaussian noise are added to the clean speech signal. While in limit the
mean of the forward process ends at the noisy mixture, in practice it stops
earlier and thus only at an approximation of the noisy mixture. This results in
a discrepancy between the terminating distribution of the forward process and
the prior used for solving the reverse process at inference. In this paper, we
address this discrepancy. To this end, we propose a forward process based on a
Brownian bridge and show that such a process leads to a reduction of the
mismatch compared to previous diffusion processes. More importantly, we show
that our approach improves in objective metrics over the baseline process with
only half of the iteration steps and having one hyperparameter less to tune.
( 2
min )
Adversarial training is a standard technique for training adversarially
robust models. In this paper, we study adversarial training as an alternating
best-response strategy in a 2-player zero-sum game. We prove that even in a
simple scenario of a linear classifier and a statistical model that abstracts
robust vs. non-robust features, the alternating best response strategy of such
game may not converge. On the other hand, a unique pure Nash equilibrium of the
game exists and is provably robust. We support our theoretical results with
experiments, showing the non-convergence of adversarial training and the
robustness of Nash equilibrium.
( 2
min )
In reinforcement learning for safety-critical settings, it is often desirable
for the agent to obey safety constraints at all points in time, including
during training. We present a novel neurosymbolic approach called SPICE to
solve this safe exploration problem. SPICE uses an online shielding layer based
on symbolic weakest preconditions to achieve a more precise safety analysis
than existing tools without unduly impacting the training process. We evaluate
the approach on a suite of continuous control benchmarks and show that it can
achieve comparable performance to existing safe learning techniques while
incurring fewer safety violations. Additionally, we present theoretical results
showing that SPICE converges to the optimal safe policy under reasonable
assumptions.
( 2
min )
Inverse molecular design is critical in material science and drug discovery,
where the generated molecules should satisfy certain desirable properties. In
this paper, we propose equivariant energy-guided stochastic differential
equations (EEGSDE), a flexible framework for controllable 3D molecule
generation under the guidance of an energy function in diffusion models.
Formally, we show that EEGSDE naturally exploits the geometric symmetry in 3D
molecular conformation, as long as the energy function is invariant to
orthogonal transformations. Empirically, under the guidance of designed energy
functions, EEGSDE significantly improves the baseline on QM9, in inverse
molecular design targeted to quantum properties and molecular structures.
Furthermore, EEGSDE is able to generate molecules with multiple target
properties by combining the corresponding energy functions linearly.
( 2
min )
Temporal distributional shifts, with underlying dynamics changing over time,
frequently occur in real-world time series and pose a fundamental challenge for
deep neural networks (DNNs). In this paper, we propose a novel deep sequence
model based on the Koopman theory for time series forecasting: Koopman Neural
Forecaster (KNF) which leverages DNNs to learn the linear Koopman space and the
coefficients of chosen measurement functions. KNF imposes appropriate inductive
biases for improved robustness against distributional shifts, employing both a
global operator to learn shared characteristics and a local operator to capture
changing dynamics, as well as a specially-designed feedback loop to
continuously update the learned operators over time for rapidly varying
behaviors. We demonstrate that \ours{} achieves superior performance compared
to the alternatives, on multiple time series datasets that are shown to suffer
from distribution shifts.
( 2
min )
A kernel-based quantum classifier is the most practical and influential
quantum machine learning technique for the hyper-linear classification of
complex data. We propose a Variational Quantum Approximate Support Vector
Machine (VQASVM) algorithm that demonstrates empirical sub-quadratic run-time
complexity with quantum operations feasible even in NISQ computers. We
experimented our algorithm with toy example dataset on cloud-based NISQ
machines as a proof of concept. We also numerically investigated its
performance on the standard Iris flower and MNIST datasets to confirm the
practicality and scalability.
( 2
min )
We analyze a large corpus of police incident narrative documents in
understanding the spatial distribution of the topics. The motivation for doing
this is that police narratives in each incident report contains very
fine-grained information that is richer than the category that is manually
assigned by the police. Our approach is to split the corpus into topics using
two different unsupervised machine learning algorithms - Latent Dirichlet
Allocation and Non-negative Matrix Factorization. We validate the performance
of each learned topic model using model coherence. Then, using a k-nearest
neighbors density ratio estimation (kNN-DRE) approach that we propose, we
estimate the spatial density ratio per topic and use this for data discovery
and analysis of each topic, allowing for insights into the described incidents
at scale. We provide a qualitative assessment of each topic and highlight some
key benefits for using our kNN-DRE model for estimating spatial trends.
( 2
min )
In this paper, we study the generalization performance of global minima for
implementing empirical risk minimization (ERM) on over-parameterized deep ReLU
nets. Using a novel deepening scheme for deep ReLU nets, we rigorously prove
that there exist perfect global minima achieving almost optimal generalization
error bounds for numerous types of data under mild conditions. Since
over-parameterization is crucial to guarantee that the global minima of ERM on
deep ReLU nets can be realized by the widely used stochastic gradient descent
(SGD) algorithm, our results indeed fill a gap between optimization and
generalization.
( 2
min )
Fixing energy leakage caused by different anomalies can result in significant
energy savings and extended appliance life. Further, it assists grid operators
in scheduling their resources to meet the actual needs of end users, while
helping end users reduce their energy costs. In this paper, we analyze the
patterns pertaining to the power consumption of dishwashers used in two houses
of the REFIT dataset. Then two autoencoder (AEs) with 1D-CNN and TCN as
backbones are trained to differentiate the normal patterns from the abnormal
ones. Our results indicate that TCN outperforms CNN1D in detecting anomalies in
energy consumption. Finally, the data from the Fridge_Freezer and the Freezer
of house No. 3 in REFIT is also used to evaluate our approach.
( 2
min )
Audio Spectrogram Transformer models rule the field of Audio Tagging,
outrunning previously dominating Convolutional Neural Networks (CNNs). Their
superiority is based on the ability to scale up and exploit large-scale
datasets such as AudioSet. However, Transformers are demanding in terms of
model size and computational requirements compared to CNNs. We propose a
training procedure for efficient CNNs based on offline Knowledge Distillation
(KD) from high-performing yet complex transformers. The proposed training
schema and the efficient CNN design based on MobileNetV3 results in models
outperforming previous solutions in terms of parameter and computational
efficiency and prediction performance. We provide models of different
complexity levels, scaling from low-complexity models up to a new
state-of-the-art performance of .483 mAP on AudioSet. Source Code available at:
https://github.com/fschmid56/EfficientAT
( 2
min )
Self-supervised learning has significantly improved the performance of many
NLP tasks. However, how can self-supervised learning discover useful
representations, and why is it better than traditional approaches such as
probabilistic models are still largely unknown. In this paper, we focus on the
context of topic modeling and highlight a key advantage of self-supervised
learning - when applied to data generated by topic models, self-supervised
learning can be oblivious to the specific model, and hence is less susceptible
to model misspecification. In particular, we prove that commonly used
self-supervised objectives based on reconstruction or contrastive samples can
both recover useful posterior information for general topic models.
Empirically, we show that the same objectives can perform on par with posterior
inference using the correct model, while outperforming posterior inference
using misspecified models.
( 2
min )
Ridesharing platforms are a type of two-sided marketplace where
``supply-demand balance'' is critical for market efficiency and yet is complex
to define and analyze. We present a unified analytical framework based on the
graph-based equilibrium metric (GEM) for quantifying the supply-demand
spatiotemporal state and efficiency of a ridesharing marketplace. GEM was
developed as a generalized Wasserstein distance between the supply and demand
distributions in a ridesharing market and has been used as an evaluation metric
for algorithms expected to improve supply-demand alignment. Building upon GEM,
we develop SD-GEM, a dual-perspective (supply- and demand-side) representation
of rideshare market equilibrium. We show that there are often disparities
between the two views and examine how this dual-view leads to the notion of
market efficiency, in which we propose novel statistical tests for capturing
improvement and explaining the underlying driving factors.
( 2
min )
Federated Learning (FL) has emerged as a de facto machine learning area and
received rapid increasing research interests from the community. However,
catastrophic forgetting caused by data heterogeneity and partial participation
poses distinctive challenges for FL, which are detrimental to the performance.
To tackle the problems, we propose a new FL approach (namely GradMA), which
takes inspiration from continual learning to simultaneously correct the
server-side and worker-side update directions as well as take full advantage of
server's rich computing and memory resources. Furthermore, we elaborate a
memory reduction strategy to enable GradMA to accommodate FL with a large scale
of workers. We then analyze convergence of GradMA theoretically under the
smooth non-convex setting and show that its convergence rate achieves a linear
speed up w.r.t the increasing number of sampled active workers. At last, our
extensive experiments on various image classification tasks show that GradMA
achieves significant performance gains in accuracy and communication efficiency
compared to SOTA baselines.
( 2
min )
Estimation of the complete distribution of a random variable is a useful
primitive for both manual and automated decision making. This problem has
received extensive attention in the i.i.d. setting, but the arbitrary data
dependent setting remains largely unaddressed. Consistent with known
impossibility results, we present computationally felicitous time-uniform and
value-uniform bounds on the CDF of the running averaged conditional
distribution of a real-valued random variable which are always valid and
sometimes trivial, along with an instance-dependent convergence guarantee. The
importance-weighted extension is appropriate for estimating complete
counterfactual distributions of rewards given controlled experimentation data
exhaust, e.g., from an A/B test or a contextual bandit.
( 2
min )
Graph neural networks (GNNs) have been applied to a large variety of
applications in materials science and chemistry. Here, we recapitulate the
graph construction for crystalline (periodic) materials and investigate its
impact on the GNNs model performance. We suggest the asymmetric unit cell as a
representation to reduce the number of atoms by using all symmetries of the
system. With a simple but systematically built GNN architecture based on
message passing and line graph templates, we furthermore introduce a general
architecture (Nested Graph Network, NGN) that is applicable to a wide range of
tasks and systematically improves state-of-the-art results on the MatBench
benchmark datasets.
( 2
min )
This paper introduces a new sparse Bayesian learning (SBL) algorithm that
jointly recovers a temporal sequence of edge maps from noisy and under-sampled
Fourier data. The new method is cast in a Bayesian framework and uses a prior
that simultaneously incorporates intra-image information to promote sparsity in
each individual edge map with inter-image information to promote similarities
in any unchanged regions. By treating both the edges as well as the similarity
between adjacent images as random variables, there is no need to separately
form regions of change. Thus we avoid both additional computational cost as
well as any information loss resulting from pre-processing the image. Our
numerical examples demonstrate that our new method compares favorably with more
standard SBL approaches.
( 2
min )
We propose a class of models based on Fisher's Linear Discriminant (FLD) in
the context of domain adaptation. The class is the convex combination of two
hypotheses: i) an average hypothesis representing previously seen source tasks
and ii) a hypothesis trained on a new target task. For a particular generative
setting we derive the optimal convex combination of the two models under 0-1
loss, propose a computable approximation, and study the effect of various
parameter settings on the relative risks between the optimal hypothesis,
hypothesis i), and hypothesis ii). We demonstrate the effectiveness of the
proposed optimal classifier in the context of EEG- and ECG-based classification
settings and argue that the optimal classifier can be computed without access
to direct information from any of the individual source tasks. We conclude by
discussing further applications, limitations, and possible future directions.
( 2
min )
We study the consequences of mode-collapse of normalizing flows in the
context of lattice field theory. Normalizing flows allow for independent
sampling. For this reason, it is hoped that they can avoid the tunneling
problem of local-update MCMC algorithms for multi-modal distributions. In this
work, we first point out that the tunneling problem is also present for
normalizing flows but is shifted from the sampling to the training phase of the
algorithm. Specifically, normalizing flows often suffer from mode-collapse for
which the training process assigns vanishingly low probability mass to relevant
modes of the physical distribution. This may result in a significant bias when
the flow is used as a sampler in a Markov-Chain or with Importance Sampling. We
propose a metric to quantify the degree of mode-collapse and derive a bound on
the resulting bias. Furthermore, we propose various mitigation strategies in
particular in the context of estimating thermodynamic observables, such as the
free energy.
( 2
min )
This study addresses the problem of performing clustering in the presence of
two types of background knowledge: pairwise constraints and monotonicity
constraints. To achieve this, the formal framework to perform clustering under
monotonicity constraints is, firstly, defined, resulting in a specific distance
measure. Pairwise constraints are integrated afterwards by designing an
objective function which combines the proposed distance measure and a pairwise
constraint-based penalty term, in order to fuse both types of information. This
objective function can be optimized with an EM optimization scheme. The
proposed method serves as the first approach to the problem it addresses, as it
is the first method designed to work with the two types of background knowledge
mentioned above. Our proposal is tested in a variety of benchmark datasets and
in a real-world case of study.
( 2
min )
Automatic recommendation systems based on deep neural networks have become
extremely popular during the last decade. Some of these systems can however be
used for applications which are ranked as High Risk by the European Commission
in the A.I. act, as for instance for online job candidate recommendation. When
used in the European Union, commercial AI systems for this purpose will then be
required to have to proper statistical properties with regard to potential
discrimination they could engender. This motivated our contribution, where we
present a novel optimal transport strategy to mitigate undesirable algorithmic
biases in multi-class neural-network classification. Our stratey is model
agnostic and can be used on any multi-class classification neural-network
model. To anticipate the certification of recommendation systems using textual
data, we then used it on the Bios dataset, for which the learning task consists
in predicting the occupation of female and male individuals, based on their
LinkedIn biography. Results show that it can reduce undesired algorithmic
biases in this context to lower levels than a standard strategy.
( 2
min )
We introduce a new methodology dubbed ``safe peeling'' to accelerate the
resolution of l0-regularized least-squares problems via a Branch-and-Bound
(BnB) method. Our procedure enables to tighten the convex relaxation considered
at each node of the BnB decision tree and therefore potentially allows for more
aggressive pruning. Numerical simulations show that our proposed methodology
leads to significant gains in terms of number of nodes explored and overall
solving time.
( 2
min )
Proximal policy optimization and trust region policy optimization (PPO and
TRPO) with actor and critic parametrized by neural networks achieve significant
empirical success in deep reinforcement learning. However, due to nonconvexity,
the global convergence of PPO and TRPO remains less understood, which separates
theory from practice. In this paper, we prove that a variant of PPO and TRPO
equipped with overparametrized neural networks converges to the globally
optimal policy at a sublinear rate. The key to our analysis is the global
convergence of infinite-dimensional mirror descent under a notion of one-point
monotonicity, where the gradient and iterate are instantiated by neural
networks. In particular, the desirable representation power and optimization
geometry induced by the overparametrization of such neural networks allow them
to accurately approximate the infinite-dimensional gradient and iterate.
( 2
min )
Bayesian experimental design (BED) provides a powerful and general framework
for optimizing the design of experiments. However, its deployment often poses
substantial computational challenges that can undermine its practical use. In
this review, we outline how recent advances have transformed our ability to
overcome these challenges and thus utilize BED effectively, before discussing
some key areas for future development in the field.
( 2
min )
We consider the problem of tracking an unknown time varying parameter that
characterizes the probabilistic evolution of a sequence of independent
observations. To this aim, we propose a stochastic gradient descent-based
recursive scheme in which the log-likelihood of the observations acts as time
varying gain function. We prove convergence in mean-square error in a suitable
neighbourhood of the unknown time varying parameter and illustrate the details
of our findings in the case where data are generated from distributions
belonging to the exponential family.
( 2
min )
Self-supervised learning has significantly improved the performance of many
NLP tasks. However, how can self-supervised learning discover useful
representations, and why is it better than traditional approaches such as
probabilistic models are still largely unknown. In this paper, we focus on the
context of topic modeling and highlight a key advantage of self-supervised
learning - when applied to data generated by topic models, self-supervised
learning can be oblivious to the specific model, and hence is less susceptible
to model misspecification. In particular, we prove that commonly used
self-supervised objectives based on reconstruction or contrastive samples can
both recover useful posterior information for general topic models.
Empirically, we show that the same objectives can perform on par with posterior
inference using the correct model, while outperforming posterior inference
using misspecified models.
( 2
min )
In an effort to address the training instabilities of GANs, we introduce a
class of dual-objective GANs with different value functions (objectives) for
the generator (G) and discriminator (D). In particular, we model each objective
using $\alpha$-loss, a tunable classification loss, to obtain
$(\alpha_D,\alpha_G)$-GANs, parameterized by $(\alpha_D,\alpha_G)\in
[0,\infty)^2$. For sufficiently large number of samples and capacities for G
and D, we show that the resulting non-zero sum game simplifies to minimizing an
$f$-divergence under appropriate conditions on $(\alpha_D,\alpha_G)$. In the
finite sample and capacity setting, we define estimation error to quantify the
gap in the generator's performance relative to the optimal setting with
infinite samples and obtain upper bounds on this error, showing it to be order
optimal under certain conditions. Finally, we highlight the value of tuning
$(\alpha_D,\alpha_G)$ in alleviating training instabilities for the synthetic
2D Gaussian mixture ring and the Stacked MNIST datasets.
( 2
min )
Estimation of the complete distribution of a random variable is a useful
primitive for both manual and automated decision making. This problem has
received extensive attention in the i.i.d. setting, but the arbitrary data
dependent setting remains largely unaddressed. Consistent with known
impossibility results, we present computationally felicitous time-uniform and
value-uniform bounds on the CDF of the running averaged conditional
distribution of a real-valued random variable which are always valid and
sometimes trivial, along with an instance-dependent convergence guarantee. The
importance-weighted extension is appropriate for estimating complete
counterfactual distributions of rewards given controlled experimentation data
exhaust, e.g., from an A/B test or a contextual bandit.
( 2
min )
Temporal distributional shifts, with underlying dynamics changing over time,
frequently occur in real-world time series and pose a fundamental challenge for
deep neural networks (DNNs). In this paper, we propose a novel deep sequence
model based on the Koopman theory for time series forecasting: Koopman Neural
Forecaster (KNF) which leverages DNNs to learn the linear Koopman space and the
coefficients of chosen measurement functions. KNF imposes appropriate inductive
biases for improved robustness against distributional shifts, employing both a
global operator to learn shared characteristics and a local operator to capture
changing dynamics, as well as a specially-designed feedback loop to
continuously update the learned operators over time for rapidly varying
behaviors. We demonstrate that \ours{} achieves superior performance compared
to the alternatives, on multiple time series datasets that are shown to suffer
from distribution shifts.
( 2
min )
Forest-based methods have recently gained in popularity for non-parametric
treatment effect estimation. Building on this line of work, we introduce causal
survival forests, which can be used to estimate heterogeneous treatment effects
in a survival and observational setting where outcomes may be right-censored.
Our approach relies on orthogonal estimating equations to robustly adjust for
both censoring and selection effects under unconfoundedness. In our
experiments, we find our approach to perform well relative to a number of
baselines.
( 2
min )
Automatic recommendation systems based on deep neural networks have become
extremely popular during the last decade. Some of these systems can however be
used for applications which are ranked as High Risk by the European Commission
in the A.I. act, as for instance for online job candidate recommendation. When
used in the European Union, commercial AI systems for this purpose will then be
required to have to proper statistical properties with regard to potential
discrimination they could engender. This motivated our contribution, where we
present a novel optimal transport strategy to mitigate undesirable algorithmic
biases in multi-class neural-network classification. Our stratey is model
agnostic and can be used on any multi-class classification neural-network
model. To anticipate the certification of recommendation systems using textual
data, we then used it on the Bios dataset, for which the learning task consists
in predicting the occupation of female and male individuals, based on their
LinkedIn biography. Results show that it can reduce undesired algorithmic
biases in this context to lower levels than a standard strategy.
( 2
min )
Kernel methods, being supported by a well-developed theory and coming with
efficient algorithms, are among the most popular and successful machine
learning techniques. From a mathematical point of view, these methods rest on
the concept of kernels and function spaces generated by kernels, so called
reproducing kernel Hilbert spaces. Motivated by recent developments of learning
approaches in the context of interacting particle systems, we investigate
kernel methods acting on data with many measurement variables. We show the
rigorous mean field limit of kernels and provide a detailed analysis of the
limiting reproducing kernel Hilbert space. Furthermore, several examples of
kernels, that allow a rigorous mean field limit, are presented.
( 2
min )
Semi-supervised learning aims to train a model using limited labels.
State-of-the-art semi-supervised methods for image classification such as PAWS
rely on self-supervised representations learned with large-scale unlabeled but
curated data. However, PAWS is often less effective when using real-world
unlabeled data that is uncurated, e.g., contains out-of-class data. We propose
RoPAWS, a robust extension of PAWS that can work with real-world unlabeled
data. We first reinterpret PAWS as a generative classifier that models
densities using kernel density estimation. From this probabilistic perspective,
we calibrate its prediction based on the densities of labeled and unlabeled
data, which leads to a simple closed-form solution from the Bayes' rule. We
demonstrate that RoPAWS significantly improves PAWS for uncurated Semi-iNat by
+5.3% and curated ImageNet by +0.4%.
( 2
min )
Partitioning a set of elements into subsets of a priori unknown sizes is
essential in many applications. These subset sizes are rarely explicitly
learned - be it the cluster sizes in clustering applications or the number of
shared versus independent generative latent factors in weakly-supervised
learning. Probability distributions over correct combinations of subset sizes
are non-differentiable due to hard constraints, which prohibit gradient-based
optimization. In this work, we propose the differentiable hypergeometric
distribution. The hypergeometric distribution models the probability of
different group sizes based on their relative importance. We introduce
reparameterizable gradients to learn the importance between groups and
highlight the advantage of explicitly learning the size of subsets in two
typical applications: weakly-supervised learning and clustering. In both
applications, we outperform previous approaches, which rely on suboptimal
heuristics to model the unknown size of groups.
( 2
min )
The most recent multi-source covariate shift algorithm is an efficient
hyperparameter optimization algorithm for missing target output. In this paper,
we extend this algorithm to the framework of federated learning. For data
islands in federated learning and covariate shift adaptation, we propose the
federated domain adaptation estimate of the target risk which is asymptotically
unbiased with a desirable asymptotic variance property. We construct a weighted
model for the target task and propose the federated covariate shift adaptation
algorithm which works preferably in our setting. The efficacy of our method is
justified both theoretically and empirically.
( 2
min )
This paper introduces a new framework of algebraic equivalence relations
between time series and new distance metrics between them, then applies these
to investigate the Australian ``Black Summer'' bushfire season of 2019-2020.
First, we introduce a general framework for defining equivalence between time
series, heuristically intended to be equivalent if they differ only up to
noise. Our first specific implementation is based on using change point
algorithms and comparing statistical quantities such as mean or variance in
stationary segments. We thus derive the existence of such equivalence relations
on the space of time series, such that the quotient spaces can be equipped with
a metrizable topology. Next, we illustrate specifically how to define and
compute such distances among a collection of time series and perform clustering
and additional analysis thereon. Then, we apply these insights to analyze air
quality data across New South Wales, Australia, during the 2019-2020 bushfires.
There, we investigate structural similarity with respect to this data and
identify locations that were impacted anonymously by the fires relative to
their location. This may have implications regarding the appropriate management
of resources to avoid gaps in the defense against future fires.
( 2
min )
Traffic systems can operate in different modes. In a previous work, we
identified these modes as different quasi-stationary states in the correlation
structure. Here, we analyze the transitions between such quasi-stationary
states, i.e., how the system changes its operational mode. In the longer run
this might be helpful to forecast the time evolution of correlation patterns in
traffic. We take Cologne orbital motorways as an example, we construct a state
transition network for each quarter of 2015 and find a seasonal dependence for
those quasi-stationary states in the traffic system. Using the PageRank
algorithm, we identify and explore the dominant states which occur frequently
within a moving time window of 60 days in 2015. To the best of our knowledge,
this is the first study of this type for traffic systems.
( 2
min )
Clustering is a widely used technique with a long and rich history in a
variety of areas. However, most existing algorithms do not scale well to large
datasets, or are missing theoretical guarantees of convergence. This paper
introduces a provably robust clustering algorithm based on loss minimization
that performs well on Gaussian mixture models with outliers. It provides
theoretical guarantees that the algorithm obtains high accuracy with high
probability under certain assumptions. Moreover, it can also be used as an
initialization strategy for $k$-means clustering. Experiments on real-world
large-scale datasets demonstrate the effectiveness of the algorithm when
clustering a large number of clusters, and a $k$-means algorithm initialized by
the algorithm outperforms many of the classic clustering methods in both speed
and accuracy, while scaling well to large datasets such as ImageNet.
( 2
min )
We propose a class of models based on Fisher's Linear Discriminant (FLD) in
the context of domain adaptation. The class is the convex combination of two
hypotheses: i) an average hypothesis representing previously seen source tasks
and ii) a hypothesis trained on a new target task. For a particular generative
setting we derive the optimal convex combination of the two models under 0-1
loss, propose a computable approximation, and study the effect of various
parameter settings on the relative risks between the optimal hypothesis,
hypothesis i), and hypothesis ii). We demonstrate the effectiveness of the
proposed optimal classifier in the context of EEG- and ECG-based classification
settings and argue that the optimal classifier can be computed without access
to direct information from any of the individual source tasks. We conclude by
discussing further applications, limitations, and possible future directions.
( 2
min )
submitted by /u/Adunaiii
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/sediba-edud-eht
[link] [comments]
( 41
min )
submitted by /u/Linkology
[link] [comments]
( 41
min )
submitted by /u/DevOpsMuffin39
[link] [comments]
( 41
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 41
min )
submitted by /u/Boce77
[link] [comments]
( 41
min )
submitted by /u/bukowski3000
[link] [comments]
( 41
min )
submitted by /u/chronck
[link] [comments]
( 41
min )
submitted by /u/henlo_there_fren
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
It would be something similar to mnist-ready (https://github.com/saoj/mnist-ready) in Ruby, but in Python. See below:
digit = MNIST.all_set[0] # first one # An integer corresponding to the digit of the image puts digit.label # => 7 # The pixels is an one-dimension array of 784 (28 x 28) pixel values from 0 to 255 puts digit.pixels.size # => 784 puts digit.pixels.inspect # => [0, 0, 0, 0, ...
It has this nice feature which allows you to see the digits:
puts digit.ascii_image ____________________________ | 7 | |----------------------------| | | | }wJY+I | | #$$$$%ddddddddQ> | | -f?fCM$M$$$$W$$c | | _^---~"8$/ | | }$h | | "&$} | | n$8! | | ~@$+ | | u$w. | | `k@~ | | x$m | | ]$%~ | | #$L | | .k$*I | | l$$] | | ;#$f | | u$$> | | +%$$> | | r$$*l | | r$h | |____________________________|
submitted by /u/niosurfer
[link] [comments]
( 43
min )
Hi everyone. Now ChatRWKV v2 can split RWKV to multiple GPUs, or stream layers (compute layer-by-layer), so you can run RWKV 14B with as few as 3G VRAM. https://github.com/BlinkDL/ChatRWKV
Example:
'cuda:0 fp16 *10 -> cuda:1 fp16 *8 -> cpu fp32' = first 10 layers on cuda:0 fp16, then 8 layers on cuda:1 fp16, then on cpu fp32
'cuda fp16 *20+' = first 20 layers on cuda fp16, then stream the rest on it
And RWKV is now a pip package: https://pypi.org/project/rwkv/
os.environ['RWKV_JIT_ON'] = '1' os.environ["RWKV_CUDA_ON"] = '0' # if '1' then compile CUDA kernel for seq mode (much faster) from rwkv.model import RWKV from rwkv.utils import PIPELINE, PIPELINE_ARGS pipeline = PIPELINE(model, "20B_tokenizer.json") # find it in https://github.com/BlinkDL/ChatRWKV # download models: https://hugg…
( 45
min )
The fashion industry is a highly lucrative business, with an estimated value of $2.1 trillion by 2025, as reported by the World Bank. This field encompasses a diverse range of segments, such as the creation, manufacture, distribution, and sales of clothing, shoes, and accessories. The industry is in a constant state of change, with new […]
( 15
min )
This post is co-written with Suhyoung Kim, General Manager at KakaoGames Data Analytics Lab. Kakao Games is a top video game publisher and developer headquartered in South Korea. It specializes in developing and publishing games on PC, mobile, and virtual reality (VR) serving globally. In order to maximize its players’ experience and improve the efficiency […]
( 14
min )
Amazon Comprehend is a managed AI service that uses natural language processing (NLP) with ready-made intelligence to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. The ability to train custom models through the Custom classification and Custom entity […]
( 10
min )
The world we live in is rapidly changing, and so are the data and features that companies and customers use to train their models. Retraining models to keep them in sync with these changes is critical to maintain accuracy. Therefore, you need an agile and dynamic approach to keep models up to date and adapt […]
( 10
min )
submitted by /u/Yasiru92
[link] [comments]
( 41
min )
submitted by /u/FettyZ
[link] [comments]
( 42
min )
The quest for knowledge at work can feel like searching for a needle in a haystack. But what if the haystack itself could reveal where the needle is? That’s the promise of large language models, or LLMs, the subject of this week’s episode of the NVIDIA AI Podcast featuring Deedy Das and Eddie Zhou, founding Read article >
( 5
min )
submitted by /u/yachay_ai
[link] [comments]
( 41
min )
submitted by /u/virtual_transject
[link] [comments]
( 41
min )
submitted by /u/pyactee
[link] [comments]
( 41
min )
Please provide feedback so I can make it better and help the AI movement.
aitoptools.com
submitted by /u/aitoptools
[link] [comments]
( 41
min )
Developers can now integrate ChatGPT and Whisper models into their apps and products through our API.
( 5
min )
submitted by /u/turtlepajama
[link] [comments]
( 42
min )
submitted by /u/TemplarTV
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 41
min )
submitted by /u/ElonJuniorMusk
[link] [comments]
( 42
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
submitted by /u/aizaz-zazii
[link] [comments]
( 41
min )
submitted by /u/Fusemachines_1
[link] [comments]
( 41
min )
submitted by /u/grahammiranda13
[link] [comments]
( 41
min )
submitted by /u/DPC_1
[link] [comments]
( 42
min )
submitted by /u/davinci-code
[link] [comments]
( 41
min )
Announcements Are Generative Adversarial Networks Really Useful? Such a question may seem as coming from a dinosaur, adverse to change. Or from someone selling traditional methods and badmouthing anything that feels threatening to his business. This is not the case here: I always try to stay neutral, and usually – while typically not a first… Read More »DSC Weekly 28 February 2023 – Generative Adversarial Networks (GANs): Are They Really Useful?
The post DSC Weekly 28 February 2023 – Generative Adversarial Networks (GANs): Are They Really Useful? appeared first on Data Science Central.
( 21
min )
Back in 2018, I had the privilege of keynoting at one of Semantic Web Company’s events in Vienna, as well as attending the full event. It was a great opportunity to immerse myself in the Central European perspective on the utility of Linked Open Data standards and how those standards were being applied. I got… Read More »FAIR Content: Better Chatbot Answers and Content Reusability at Scale
The post FAIR Content: Better Chatbot Answers and Content Reusability at Scale appeared first on Data Science Central.
( 21
min )
submitted by /u/shani_786
[link] [comments]
( 41
min )
In today’s highly competitive market, performing data analytics using machine learning (ML) models has become a necessity for organizations. It enables them to unlock the value of their data, identify trends, patterns, and predictions, and differentiate themselves from their competitors. For example, in the healthcare industry, ML-driven analytics can be used for diagnostic assistance and […]
( 12
min )
Fraud detection is an important problem that has applications in financial services, social media, ecommerce, gaming, and other industries. This post presents an implementation of a fraud detection solution using the Relational Graph Convolutional Network (RGCN) model to predict the probability that a transaction is fraudulent through both the transductive and inductive inference modes. You can deploy our implementation to an Amazon SageMaker endpoint as a real-time fraud detection solution, without requiring external graph storage or orchestration, thereby significantly reducing the deployment cost of the model.
( 11
min )
As the meteoric rise of ChatGPT demonstrates, generative AI can unlock enormous potential for companies, teams and individuals. Whether simplifying time-consuming tasks or accelerating 3D workflows to boost creativity and productivity, generative AI is already making an impact across industries — and there’s much more to come. How generative AI is paving the way for Read article >
( 5
min )
Brian Spears says his children will enjoy a more sustainable planet, thanks in part to AI and high performance computing (HPC) simulations. “I believe I’ll see fusion energy in my lifetime, and I’m confident my daughters will see a fusion-powered world,” said the 45-year-old principal investigator at Lawrence Livermore National Laboratory who helped demonstrate the Read article >
( 6
min )
ManvsMachine steps In the NVIDIA Studio this week to share insights behind fractal art — which uses algorithms to artistically represent calculations — derived from geometric objects as digital images and animations.
( 6
min )
Streaming video on PCs through Google Chrome and Microsoft Edge browsers is getting a GeForce RTX-sized upgrade today with the release of RTX Video Super Resolution (VSR). Nearly 80% of internet bandwidth today is streaming video. And 90% of that content streams at 1080p or lower, including from popular sources like Twitch.tv, YouTube, Netflix, Disney+ Read article >
( 6
min )
Inferring causal structure from data is a challenging task of fundamental
importance in science. Observational data are often insufficient to identify a
system's causal structure uniquely. While conducting interventions (i.e.,
experiments) can improve the identifiability, such samples are usually
challenging and expensive to obtain. Hence, experimental design approaches for
causal discovery aim to minimize the number of interventions by estimating the
most informative intervention target. In this work, we propose a novel
Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts'
the gradient estimator of a gradient-based causal discovery framework to
provide signals for the intervention acquisition function. We provide extensive
experiments in simulated and real-world datasets and demonstrate that GIT
performs on par with competitive baselines, surpassing them in the low-data
regime.
( 2
min )
In this work, we propose a self-improving artificial intelligence system to
enhance the safety performance of reinforcement learning (RL)-based autonomous
driving (AD) agents using black-box verification methods. RL algorithms have
become popular in AD applications in recent years. However, the performance of
existing RL algorithms heavily depends on the diversity of training scenarios.
A lack of safety-critical scenarios during the training phase could result in
poor generalization performance in real-world driving applications. We propose
a novel framework in which the weaknesses of the training set are explored
through black-box verification methods. After discovering AD failure scenarios,
the RL agent's training is re-initiated via transfer learning to improve the
performance of previously unsafe scenarios. Simulation results demonstrate that
our approach efficiently discovers safety failures of action decisions in
RL-based adaptive cruise control (ACC) applications and significantly reduces
the number of vehicle collisions through iterative applications of our method.
The source code is publicly available at
https://github.com/data-and-decision-lab/self-improving-RL.
( 2
min )
In the end-of-line test of geared motors, the evaluation of product qual-ity
is important. Due to time constraints and the high diversity of variants,
acous-tic measurements are more economical than vibration measurements.
However, the acoustic data is affected by industrial disturbing noise.
Therefore, the aim of this study is to investigate the robustness of features
used for anomaly detection in geared motor end-of-line testing. A real-world
dataset with typical faults and acoustic disturbances is recorded by an
acoustic array. This includes industrial noise from the production and
systematically produced disturbances, used to compare the robustness. Overall,
it is proposed to apply features extracted from a log-envelope spectrum
together with psychoacoustic features. The anomaly de-tection is done by using
the isolation forest or the more universal bagging random miner. Most
disturbances can be circumvented, while the use of a hammer or air pressure
often causes problems. In general, these results are important for condi-tion
monitoring tasks that are based on acoustic or vibration measurements.
Fur-thermore, a real-world problem description is presented to improve common
sig-nal processing and machine learning tasks.
( 2
min )
The recent literature on online learning to rank (LTR) has established the
utility of prior knowledge to Bayesian ranking bandit algorithms. However, a
major limitation of existing work is the requirement for the prior used by the
algorithm to match the true prior. In this paper, we propose and analyze
adaptive algorithms that address this issue and additionally extend these
results to the linear and generalized linear models. We also consider scalar
relevance feedback on top of click feedback. Moreover, we demonstrate the
efficacy of our algorithms using both synthetic and real-world experiments.
( 2
min )
Research on deep reinforcement learning (DRL) based production scheduling
(PS) has gained a lot of attention in recent years, primarily due to the high
demand for optimizing scheduling problems in diverse industry settings.
Numerous studies are carried out and published as stand-alone experiments that
often vary only slightly with respect to problem setups and solution
approaches. The programmatic core of these experiments is typically very
similar. Despite this fact, no standardized and resilient framework for
experimentation on PS problems with DRL algorithms could be established so far.
In this paper, we introduce schlably, a Python-based framework that provides
researchers a comprehensive toolset to facilitate the development of PS
solution strategies based on DRL. schlably eliminates the redundant overhead
work that the creation of a sturdy and flexible backbone requires and increases
the comparability and reusability of conducted research work.
( 2
min )
Distributed deep learning (DDL) systems strongly depend on network
performance. Current electronic packet switched (EPS) network architectures and
technologies suffer from variable diameter topologies, low-bisection bandwidth
and over-subscription affecting completion time of communication and collective
operations.
We introduce a near-exascale, full-bisection bandwidth, all-to-all,
single-hop, all-optical network architecture with nanosecond reconfiguration
called RAMP, which supports large-scale distributed and parallel computing
systems (12.8~Tbps per node for up to 65,536 nodes).
For the first time, a custom RAMP-x MPI strategy and a network transcoder is
proposed to run MPI collective operations across the optical circuit switched
(OCS) network in a schedule-less and contention-less manner. RAMP achieves
7.6-171$\times$ speed-up in completion time across all MPI operations compared
to realistic EPS and OCS counterparts. It can also deliver a 1.3-16$\times$ and
7.8-58$\times$ reduction in Megatron and DLRM training time respectively} while
offering 42-53$\times$ and 3.3-12.4$\times$ improvement in energy consumption
and cost respectively.
( 2
min )
In the context of keyword spotting (KWS), the replacement of handcrafted
speech features by learnable features has not yielded superior KWS performance.
In this study, we demonstrate that filterbank learning outperforms handcrafted
speech features for KWS whenever the number of filterbank channels is severely
decreased. Reducing the number of channels might yield certain KWS performance
drop, but also a substantial energy consumption reduction, which is key when
deploying common always-on KWS on low-resource devices. Experimental results on
a noisy version of the Google Speech Commands Dataset show that filterbank
learning adapts to noise characteristics to provide a higher degree of
robustness to noise, especially when dropout is integrated. Thus, switching
from typically used 40-channel log-Mel features to 8-channel learned features
leads to a relative KWS accuracy loss of only 3.5% while simultaneously
achieving a 6.3x energy consumption reduction.
( 2
min )
The imputation of missing values represents a significant obstacle for many
real-world data analysis pipelines. Here, we focus on time series data and put
forward SSSD, an imputation model that relies on two emerging technologies,
(conditional) diffusion models as state-of-the-art generative models and
structured state space models as internal model architecture, which are
particularly suited to capture long-term dependencies in time series data. We
demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic
imputation and forecasting performance on a broad range of data sets and
different missingness scenarios, including the challenging blackout-missing
scenarios, where prior approaches failed to provide meaningful results.
( 2
min )
In this paper, we study first-order algorithms for solving fully composite
optimization problems over bounded sets. We treat the differentiable and
non-differentiable parts of the objective separately, linearizing only the
smooth components. This provides us with new generalizations of the classical
and accelerated Frank-Wolfe methods, that are applicable to non-differentiable
problems whenever we can access the structure of the objective. We prove global
complexity bounds for our algorithms that are optimal in several settings.
( 2
min )
This paper describes our participation in SemEval-2023 Task 9, Intimacy
Analysis of Multilingual Tweets. We fine-tune some of the most popular
transformer models with the training dataset and synthetic data generated by
different data augmentation techniques. During the development phase, our best
results were obtained by using XLM-T. Data augmentation techniques provide a
very slight improvement in the results. Our system ranked in the 27th position
out of the 45 participating systems. Despite its modest results, our system
shows promising results in languages such as Portuguese, English, and Dutch.
All our code is available in the repository
\url{https://github.com/isegura/hulat_intimacy}.
( 2
min )
We study the problem of inferring heterogeneous treatment effects (HTEs) from
time-to-event data in the presence of competing events. Albeit its great
practical relevance, this problem has received little attention compared to its
counterparts studying HTE estimation without time-to-event data or competing
events. We take an outcome modeling approach to estimating HTEs, and consider
how and when existing prediction models for time-to-event data can be used as
plug-in estimators for potential outcomes. We then investigate whether
competing events present new challenges for HTE estimation -- in addition to
the standard confounding problem --, and find that, because there are multiple
definitions of causal effects in this setting -- namely total, direct and
separable effects --, competing events can act as an additional source of
covariate shift depending on the desired treatment effect interpretation and
associated estimand. We theoretically analyze and empirically illustrate when
and how these challenges play a role when using generic machine learning
prediction models for the estimation of HTEs.
( 2
min )
In this study, we validate the findings of previously published papers,
showing the feasibility of an Electroencephalography (EEG) based gaze
estimation. Moreover, we extend previous research by demonstrating that with
only a slight drop in model performance, we can significantly reduce the number
of electrodes, indicating that a high-density, expensive EEG cap is not
necessary for the purposes of EEG-based eye tracking. Using data-driven
approaches, we establish which electrode clusters impact gaze estimation and
how the different types of EEG data preprocessing affect the models'
performance. Finally, we also inspect which recorded frequencies are most
important for the defined tasks.
( 2
min )
In the present work, we introduce a novel approach to enhance the precision
of reduced order models by exploiting a multi-fidelity perspective and
DeepONets. Reduced models provide a real-time numerical approximation by
simplifying the original model. The error introduced by such operation is
usually neglected and sacrificed in order to reach a fast computation. We
propose to couple the model reduction to a machine learning residual learning,
such that the above-mentioned error can be learnt by a neural network and
inferred for new predictions. We emphasize that the framework maximizes the
exploitation of the high-fidelity information, using it for building the
reduced order model and for learning the residual. In this work we explore the
integration of proper orthogonal decomposition (POD), and gappy POD for sensors
data, with the recent DeepONet architecture. Numerical investigations for a
parametric benchmark function and a nonlinear parametric Navier-Stokes problem
are presented.
( 2
min )
Federated learning (FL) was originally regarded as a framework for
collaborative learning among clients with data privacy protection through a
coordinating server. In this paper, we propose a new active membership
inference (AMI) attack carried out by a dishonest server in FL. In AMI attacks,
the server crafts and embeds malicious parameters into global models to
effectively infer whether a target data sample is included in a client's
private training data or not. By exploiting the correlation among data features
through a non-linear decision boundary, AMI attacks with a certified guarantee
of success can achieve severely high success rates under rigorous local
differential privacy (LDP) protection; thereby exposing clients' training data
to significant privacy risk. Theoretical and experimental results on several
benchmark datasets show that adding sufficient privacy-preserving noise to
prevent our attack would significantly damage FL's model utility.
( 2
min )
Accurate and real-time traffic state prediction is of great practical
importance for urban traffic control and web mapping services (e.g. Google
Maps). With the support of massive data, deep learning methods have shown their
powerful capability in capturing the complex spatio-temporal patterns of road
networks. However, existing approaches use independent components to model
temporal and spatial dependencies and thus ignore the heterogeneous
characteristics of traffic flow that vary with time and space. In this paper,
we propose a novel dynamic graph convolution network with spatio-temporal
attention fusion. The method not only captures local spatio-temporal
information that changes over time, but also comprehensively models
long-distance and multi-scale spatio-temporal patterns based on the fusion
mechanism of temporal and spatial attention. This design idea can greatly
improve the spatio-temporal perception of the model. We conduct extensive
experiments in 4 real-world datasets to demonstrate that our model achieves
state-of-the-art performance compared to 22 baseline models.
( 2
min )
To address the problem of medical image recognition, computer vision
techniques like convolutional neural networks (CNN) are frequently used.
Recently, 3D CNN-based models dominate the field of magnetic resonance image
(MRI) analytics. Due to the high similarity between MRI data and videos, we
conduct extensive empirical studies on video recognition techniques for MRI
classification to answer the questions: (1) can we directly use video
recognition models for MRI classification, (2) which model is more appropriate
for MRI, (3) are the common tricks like data augmentation in video recognition
still useful for MRI classification? Our work suggests that advanced video
techniques benefit MRI classification. In this paper, four datasets of
Alzheimer's and Parkinson's disease recognition are utilized in experiments,
together with three alternative video recognition models and data augmentation
techniques that are frequently applied to video tasks. In terms of efficiency,
the results reveal that the video framework performs better than 3D-CNN models
by 5% - 11% with 50% - 66% less trainable parameters. This report pushes
forward the potential fusion of 3D medical imaging and video understanding
research.
( 2
min )
Despite the major progress of deep models as learning machines, uncertainty
estimation remains a major challenge. Existing solutions rely on modified loss
functions or architectural changes. We propose to compensate for the lack of
built-in uncertainty estimates by supplementing any network, retrospectively,
with a subsequent vine copula model, in an overall compound we call Vine-Copula
Neural Network (VCNN). Through synthetic and real-data experiments, we show
that VCNNs could be task (regression/classification) and architecture
(recurrent, fully connected) agnostic while providing reliable and
better-calibrated uncertainty estimates, comparable to state-of-the-art
built-in uncertainty solutions.
( 2
min )
This paper presents a novel approach for multimodal data fusion based on the
Vector-Quantized Variational Autoencoder (VQVAE) architecture. The proposed
method is simple yet effective in achieving excellent reconstruction
performance on paired MNIST-SVHN data and WiFi spectrogram data. Additionally,
the multimodal VQVAE model is extended to the 5G communication scenario, where
an end-to-end Channel State Information (CSI) feedback system is implemented to
compress data transmitted between the base-station (eNodeB) and User Equipment
(UE), without significant loss of performance. The proposed model learns a
discriminative compressed feature space for various types of input data (CSI,
spectrograms, natural images, etc), making it a suitable solution for
applications with limited computational resources.
( 2
min )
To accelerate the inference of deep neural networks (DNNs), quantization with
low-bitwidth numbers is actively researched. A prominent challenge is to
quantize the DNN models into low-bitwidth numbers without significant accuracy
degradation, especially at very low bitwidths (< 8 bits). This work targets an
adaptive data representation with variable-length encoding called DyBit. DyBit
can dynamically adjust the precision and range of separate bit-field to be
adapted to the DNN weights/activations distribution. We also propose a
hardware-aware quantization framework with a mixed-precision accelerator to
trade-off the inference accuracy and speedup. Experimental results demonstrate
that the inference accuracy via DyBit is 1.997% higher than the
state-of-the-art at 4-bit quantization, and the proposed framework can achieve
up to 8.1x speedup compared with the original model.
( 2
min )
We study differentially private (DP) machine learning algorithms as instances
of noisy fixed-point iterations, in order to derive privacy and utility results
from this well-studied framework. We show that this new perspective recovers
popular private gradient-based methods like DP-SGD and provides a principled
way to design and analyze new private optimization algorithms in a flexible
manner. Focusing on the widely-used Alternating Directions Method of
Multipliers (ADMM) method, we use our general framework to derive novel private
ADMM algorithms for centralized, federated and fully decentralized learning.
For these three algorithms, we establish strong privacy guarantees leveraging
privacy amplification by iteration and by subsampling. Finally, we provide
utility guarantees using a unified analysis that exploits a recent linear
convergence result for noisy fixed-point iterations.
( 2
min )
Recent advancements in interpretability research made transformer language
models more transparent. This progress led to a better understanding of their
inner workings for toy and naturally occurring models. However, how these
models internally process sentiment changes has yet to be sufficiently
answered. In this work, we introduce a new interpretability tool called PCP
ablation, where we replace modules with low-rank matrices based on the
principal components of their activations, reducing model parameters and their
behavior to essentials. We demonstrate PCP ablations on MLP and attention
layers in backdoored toy, backdoored large, and naturally occurring models. We
determine MLPs as most important for the backdoor mechanism and use this
knowledge to remove, insert, and modify backdoor mechanisms with engineered
replacements via PCP ablation.
( 2
min )
We prove that the set of functions representable by ReLU neural networks with
integer weights strictly increases with the network depth while allowing
arbitrary width. More precisely, we show that $\lceil\log_2(n)\rceil$ hidden
layers are indeed necessary to compute the maximum of $n$ numbers, matching
known upper bounds. Our results are based on the known duality between neural
networks and Newton polytopes via tropical geometry. The integrality assumption
implies that these Newton polytopes are lattice polytopes. Then, our depth
lower bounds follow from a parity argument on the normalized volume of faces of
such polytopes.
( 2
min )
Morphological atlases are an important tool in organismal studies, and modern
high-throughput Computed Tomography (CT) facilities can produce hundreds of
full-body high-resolution volumetric images of organisms. However, creating an
atlas from these volumes requires accurate organ segmentation. In the last
decade, machine learning approaches have achieved incredible results in image
segmentation tasks, but they require large amounts of annotated data for
training. In this paper, we propose a self-training framework for multi-organ
segmentation in tomographic images of Medaka fish. We utilize the
pseudo-labeled data from a pretrained Teacher model and adopt a Quality
Classifier to refine the pseudo-labeled data. Then, we introduce a pixel-wise
knowledge distillation method to prevent overfitting to the pseudo-labeled data
and improve the segmentation performance. The experimental results demonstrate
that our method improves mean Intersection over Union (IoU) by 5.9% on the full
dataset and enables keeping the quality while using three times less markup.
( 2
min )
Studies involving both randomized experiments as well as observational data
typically involve time-to-event outcomes such as time-to-failure, death or
onset of an adverse condition. Such outcomes are typically subject to censoring
due to loss of follow-up and established statistical practice involves
comparing treatment efficacy in terms of hazard ratios between the treated and
control groups. In this paper we propose a statistical approach to recovering
sparse phenogroups (or subtypes) that demonstrate differential treatment
effects as compared to the study population. Our approach involves modelling
the data as a mixture while enforcing parameter shrinkage through structured
sparsity regularization. We propose a novel inference procedure for the
proposed model and demonstrate its efficacy in recovering sparse phenotypes
across large landmark real world clinical studies in cardiovascular health.
( 2
min )
Previous pitch-controllable text-to-speech (TTS) models rely on directly
modeling fundamental frequency, leading to low variance in synthesized speech.
To address this issue, we propose PITS, an end-to-end pitch-controllable TTS
model that utilizes variational inference to model pitch. Based on VITS, PITS
incorporates the Yingram encoder, the Yingram decoder, and adversarial training
of pitch-shifted synthesis to achieve pitch-controllability. Experiments
demonstrate that PITS generates high-quality speech that is indistinguishable
from ground truth speech and has high pitch-controllability without quality
degradation. Code and audio samples will be available at
https://github.com/anonymous-pits/pits.
( 2
min )
Effectively scaling large Transformer models is a main driver of recent
advances in natural language processing. Dynamic neural networks, as an
emerging research direction, are capable of scaling up neural networks with
sub-linear increases in computation and time by dynamically adjusting their
computational path based on the input. Dynamic neural networks could be a
promising solution to the growing parameter numbers of pretrained language
models, allowing both model pretraining with trillions of parameters and faster
inference on mobile devices. In this survey, we summarize progress of three
types of dynamic neural networks in NLP: skimming, mixture of experts, and
early exit. We also highlight current challenges in dynamic neural networks and
directions for future research.
( 2
min )
Contextual bandit algorithms often estimate reward models to inform
decision-making. However, true rewards can contain action-independent
redundancies that are not relevant for decision-making. We show it is more
data-efficient to estimate any function that explains the reward differences
between actions, that is, the treatment effects. Motivated by this observation,
building on recent work on oracle-based bandit algorithms, we provide the first
reduction of contextual bandits to general-purpose heterogeneous treatment
effect estimation, and we design a simple and computationally efficient
algorithm based on this reduction. Our theoretical and experimental results
demonstrate that heterogeneous treatment effect estimation in contextual
bandits offers practical advantages over reward estimation, including more
efficient model estimation and greater flexibility to model misspecification.
( 2
min )
Non-asymptotic statistical analysis is often missing for modern
geometry-aware machine learning algorithms due to the possibly intricate
non-linear manifold structure. This paper studies an intrinsic mean model on
the manifold of restricted positive semi-definite matrices and provides a
non-asymptotic statistical analysis of the Karcher mean. We also consider a
general extrinsic signal-plus-noise model, under which a deterministic error
bound of the Karcher mean is provided. As an application, we show that the
distributed principal component analysis algorithm, LRC-dPCA, achieves the same
performance as the full sample PCA algorithm. Numerical experiments lend strong
support to our theories.
( 2
min )
Traffic prediction is a flourishing research field due to its importance in
human mobility in the urban space. Despite this, existing studies only focus on
short-term prediction of up to few hours in advance, with most being up to one
hour only. Long-term traffic prediction can enable more comprehensive,
informed, and proactive measures against traffic congestion and is therefore an
important task to explore. In this paper, we explore the task of long-term
traffic prediction; where we predict traffic up to 24 hours in advance. We note
the weaknesses of existing models--which are based on recurrent structures--for
long-term traffic prediction and propose a modified Transformer model
``TrafFormer". Experiments comparing our model with existing hybrid neural
network models show the superiority of our model.
( 2
min )
In sponsored search advertising (SSA), keywords serve as the basic unit of
business model, linking three stakeholders: consumers, advertisers and search
engines. This paper presents an overarching framework for keyword decisions
that highlights the touchpoints in search advertising management, including
four levels of keyword decisions, i.e., domain-specific keyword pool
generation, keyword targeting, keyword assignment and grouping, and keyword
adjustment. Using this framework, we review the state-of-the-art research
literature on keyword decisions with respect to techniques, input features and
evaluation metrics. Finally, we discuss evolving issues and identify potential
gaps that exist in the literature and outline novel research perspectives for
future exploration.
( 2
min )
The cosmic microwave background (CMB) is a significant source of knowledge
about the origin and evolution of our universe. However, observations of the
CMB are contaminated by foreground emissions, obscuring the CMB signal and
reducing its efficacy in constraining cosmological parameters. We employ deep
learning as a data-driven approach to CMB cleaning from multi-frequency
full-sky maps. In particular, we develop a graph-based Bayesian convolutional
neural network based on the U-Net architecture that predicts cleaned CMB with
pixel-wise uncertainty estimates. We demonstrate the potential of this
technique on realistic simulated data based on the Planck mission. We show that
our model accurately recovers the cleaned CMB sky map and resulting angular
power spectrum while identifying regions of uncertainty. Finally, we discuss
the current challenges and the path forward for deploying our model for CMB
recovery on real observations.
( 2
min )
Modelling stockpile is a key factor of a project economic and operation in
mining, because not all the mined ores are not able to mill for many reasons.
Further, the financial value of the ore in the stockpile needs to be reflected
on the balance sheet. Therefore, automatically tracking the frontiers of the
stockpile facilitates the mine scheduling engineers to calculate the tonnage of
the ore remaining in the stockpile. This paper suggests how the dynamic of
stockpile shape changes caused by dumping and reclaiming operations can be
inferred using polygon models. The presented work also demonstrates how the
geometry of stockpiles can be inferred in the absence of reclaimed bucket
information, in which case the reclaim polygons are established using the
diggers GPS positional data at the time of truck loading. This work further
compares two polygon models for creating 2D shapes.
( 2
min )
Fast model updates for unseen tasks on intelligent edge devices are crucial
but also challenging due to the limited computational power. In this paper,we
propose MetaLDC, which meta-trains braininspired ultra-efficient
low-dimensional computing classifiers to enable fast adaptation on tiny devices
with minimal computational costs. Concretely, during the meta-training stage,
MetaLDC meta trains a representation offline by explicitly taking into account
that the final (binary) class layer will be fine-tuned for fast adaptation for
unseen tasks on tiny devices; during the meta-testing stage, MetaLDC uses
closed-form gradients of the loss function to enable fast adaptation of the
class layer. Unlike traditional neural networks, MetaLDC is designed based on
the emerging LDC framework to enable ultra-efficient on-device inference. Our
experiments have demonstrated that compared to SOTA baselines, MetaLDC achieves
higher accuracy, robustness against random bit errors, as well as
cost-efficient hardware computation.
( 2
min )
Since its introduction in 2017, physics-informed deep learning (PIDL) has
garnered growing popularity in understanding the evolution of systems governed
by physical laws in terms of partial differential equations (PDEs). However,
empirical evidence points to the limitations of PIDL for learning certain types
of PDEs. In this paper, we (a) present the challenges in training PIDL
architecture, (b) contrast the performance of PIDL architecture in learning a
first order scalar hyperbolic conservation law and its parabolic counterpart,
(c) investigate the effect of training data sampling, which corresponds to
various sensing scenarios in traffic networks, (d) comment on the implications
of PIDL limitations for traffic flow estimation and prediction in practice.
Detailed in the case study, we present the contradistinction in PIDL results
between learning the traffic flow model (LWR PDE) and its variation with
diffusion. The outcome indicates that PIDL experiences significant challenges
in learning the hyperbolic LWR equation due to the non-smoothness of its
solution. On the other hand, the architecture with parabolic PDE, augmented
with the diffusion term, leads to the successful reassembly of the density data
even with the shockwaves present.
( 2
min )
Federated learning (FL) is a popular technique for training a global model on
data distributed across client devices. Like other distributed training
techniques, FL is susceptible to straggler (slower or failed) clients. Recent
work has proposed to address this through device-to-device (D2D) offloading,
which introduces privacy concerns. In this paper, we propose a novel
straggler-optimal approach for coded matrix computations which can
significantly reduce the communication delay and privacy issues introduced from
D2D data transmissions in FL. Moreover, our proposed approach leads to a
considerable improvement of the local computation speed when the generated data
matrix is sparse. Numerical evaluations confirm the superiority of our proposed
method over baseline approaches.
( 2
min )
We revisit the original approach of using deep learning and neural networks
to solve differential equations by incorporating the knowledge of the equation.
This is done by adding a dedicated term to the loss function during the
optimization procedure in the training process. The so-called physics-informed
neural networks (PINNs) are tested on a variety of academic ordinary
differential equations in order to highlight the benefits and drawbacks of this
approach with respect to standard integration methods. We focus on the
possibility to use the least possible amount of data into the training process.
The principles of PINNs for solving differential equations by enforcing
physical laws via penalizing terms are reviewed. A tutorial on a simple
equation model illustrates how to put into practice the method for ordinary
differential equations. Benchmark tests show that a very small amount of
training data is sufficient to predict the solution when the non linearity of
the problem is weak. However, this is not the case in strongly non linear
problems where a priori knowledge of training data over some partial or the
whole time integration interval is necessary.
( 2
min )
Inferring causal structure from data is a challenging task of fundamental
importance in science. Observational data are often insufficient to identify a
system's causal structure uniquely. While conducting interventions (i.e.,
experiments) can improve the identifiability, such samples are usually
challenging and expensive to obtain. Hence, experimental design approaches for
causal discovery aim to minimize the number of interventions by estimating the
most informative intervention target. In this work, we propose a novel
Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts'
the gradient estimator of a gradient-based causal discovery framework to
provide signals for the intervention acquisition function. We provide extensive
experiments in simulated and real-world datasets and demonstrate that GIT
performs on par with competitive baselines, surpassing them in the low-data
regime.
( 2
min )
Contextual bandit algorithms often estimate reward models to inform
decision-making. However, true rewards can contain action-independent
redundancies that are not relevant for decision-making. We show it is more
data-efficient to estimate any function that explains the reward differences
between actions, that is, the treatment effects. Motivated by this observation,
building on recent work on oracle-based bandit algorithms, we provide the first
reduction of contextual bandits to general-purpose heterogeneous treatment
effect estimation, and we design a simple and computationally efficient
algorithm based on this reduction. Our theoretical and experimental results
demonstrate that heterogeneous treatment effect estimation in contextual
bandits offers practical advantages over reward estimation, including more
efficient model estimation and greater flexibility to model misspecification.
( 2
min )
The imputation of missing values represents a significant obstacle for many
real-world data analysis pipelines. Here, we focus on time series data and put
forward SSSD, an imputation model that relies on two emerging technologies,
(conditional) diffusion models as state-of-the-art generative models and
structured state space models as internal model architecture, which are
particularly suited to capture long-term dependencies in time series data. We
demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic
imputation and forecasting performance on a broad range of data sets and
different missingness scenarios, including the challenging blackout-missing
scenarios, where prior approaches failed to provide meaningful results.
( 2
min )
Bayesian additive regression trees (BART) is a semi-parametric regression
model offering state-of-the-art performance on out-of-sample prediction.
Despite this success, standard implementations of BART typically provide
inaccurate prediction and overly narrow prediction intervals at points outside
the range of the training data. This paper proposes a novel extrapolation
strategy that grafts Gaussian processes to the leaf nodes in BART for
predicting points outside the range of the observed data. The new method is
compared to standard BART implementations and recent frequentist
resampling-based methods for predictive inference. We apply the new approach to
a challenging problem from causal inference, wherein for some regions of
predictor space, only treated or untreated units are observed (but not both).
In simulation studies, the new approach boasts superior performance compared to
popular alternatives, such as Jackknife+.
( 2
min )
We study the problem of inferring heterogeneous treatment effects (HTEs) from
time-to-event data in the presence of competing events. Albeit its great
practical relevance, this problem has received little attention compared to its
counterparts studying HTE estimation without time-to-event data or competing
events. We take an outcome modeling approach to estimating HTEs, and consider
how and when existing prediction models for time-to-event data can be used as
plug-in estimators for potential outcomes. We then investigate whether
competing events present new challenges for HTE estimation -- in addition to
the standard confounding problem --, and find that, because there are multiple
definitions of causal effects in this setting -- namely total, direct and
separable effects --, competing events can act as an additional source of
covariate shift depending on the desired treatment effect interpretation and
associated estimand. We theoretically analyze and empirically illustrate when
and how these challenges play a role when using generic machine learning
prediction models for the estimation of HTEs.
( 2
min )
In this paper, we introduce two methods to solve the American-style option
pricing problem and its dual form at the same time using neural networks.
Without applying nested Monte Carlo, the first method uses a series of neural
networks to simultaneously compute both the lower and upper bounds of the
option price, and the second one accomplishes the same goal with one global
network. The avoidance of extra simulations and the use of neural networks
significantly reduce the computational complexity and allow us to price
Bermudan options with frequent exercise opportunities in high dimensions, as
illustrated by the provided numerical experiments. As a by-product, these
methods also derive a hedging strategy for the option, which can also be used
as a control variate for variance reduction.
( 2
min )
Non-asymptotic statistical analysis is often missing for modern
geometry-aware machine learning algorithms due to the possibly intricate
non-linear manifold structure. This paper studies an intrinsic mean model on
the manifold of restricted positive semi-definite matrices and provides a
non-asymptotic statistical analysis of the Karcher mean. We also consider a
general extrinsic signal-plus-noise model, under which a deterministic error
bound of the Karcher mean is provided. As an application, we show that the
distributed principal component analysis algorithm, LRC-dPCA, achieves the same
performance as the full sample PCA algorithm. Numerical experiments lend strong
support to our theories.
( 2
min )
A Shared Nearest Neighbor (SNN) graph is a type of graph construction using
shared nearest neighbor information, which is a secondary similarity measure
based on the rankings induced by a primary $k$-nearest neighbor ($k$-NN)
measure. SNN measures have been touted as being less prone to the curse of
dimensionality than conventional distance measures, and thus methods using SNN
graphs have been widely used in applications, particularly in clustering
high-dimensional data sets and in finding outliers in subspaces of high
dimensional data. Despite this, the theoretical study of SNN graphs and graph
Laplacians remains unexplored. In this pioneering work, we make the first
contribution in this direction. We show that large scale asymptotics of an SNN
graph Laplacian reach a consistent continuum limit; this limit is the same as
that of a $k$-NN graph Laplacian. Moreover, we show that the pointwise
convergence rate of the graph Laplacian is linear with respect to $(k/n)^{1/m}$
with high probability.
( 2
min )
We prove that the set of functions representable by ReLU neural networks with
integer weights strictly increases with the network depth while allowing
arbitrary width. More precisely, we show that $\lceil\log_2(n)\rceil$ hidden
layers are indeed necessary to compute the maximum of $n$ numbers, matching
known upper bounds. Our results are based on the known duality between neural
networks and Newton polytopes via tropical geometry. The integrality assumption
implies that these Newton polytopes are lattice polytopes. Then, our depth
lower bounds follow from a parity argument on the normalized volume of faces of
such polytopes.
( 2
min )
Studies involving both randomized experiments as well as observational data
typically involve time-to-event outcomes such as time-to-failure, death or
onset of an adverse condition. Such outcomes are typically subject to censoring
due to loss of follow-up and established statistical practice involves
comparing treatment efficacy in terms of hazard ratios between the treated and
control groups. In this paper we propose a statistical approach to recovering
sparse phenogroups (or subtypes) that demonstrate differential treatment
effects as compared to the study population. Our approach involves modelling
the data as a mixture while enforcing parameter shrinkage through structured
sparsity regularization. We propose a novel inference procedure for the
proposed model and demonstrate its efficacy in recovering sparse phenotypes
across large landmark real world clinical studies in cardiovascular health.
( 2
min )
Hi everyone, I'm doing a personal project about what people think about music generating AIs. It will be very helpful if you take your time to do this survey. It will take about 5 minutes. Thank you so much for your participation.
https://docs.google.com/forms/d/e/1FAIpQLSfLHjRaWAsdGrK6Zn8X-CW17Vjn0W8EJEwEflnX7ucWn2eGBA/viewform?usp=pp_url
submitted by /u/KindlyGuess419
[link] [comments]
( 41
min )
Microsoft hooks ChatGPT up to a robot, NVIDIA promises to improve AI performance 1 million times over the next decade, AWS hugs Hugging Face, ControlNet takes image generation by storm, and more -
https://scottswigart.substack.com/p/whats-new-in-generative-ai-2023-02
submitted by /u/smswigart
[link] [comments]
( 41
min )
submitted by /u/GodGivenRx
[link] [comments]
( 41
min )
submitted by /u/yikeshardware
[link] [comments]
( 42
min )
submitted by /u/trcytony
[link] [comments]
( 41
min )
submitted by /u/citidotio
[link] [comments]
( 41
min )
submitted by /u/AlternativeFee1
[link] [comments]
( 41
min )
Meet the Google for Startups Accelerator Canada class of 2023!
Bidmii is an online marketplace that quickly connects homeowners and contractors for home improvement projects, guaranteeing payment security for each party by holding payments in trust.
Chimoney enables businesses to send payments to phones, emails and Twitter, regardless of scale, currency, country and other factors.
Clavis Studio is an AI and machine learning (ML)-driven design, visualization, and sourcing platform that provides a marketplace for designers and decorators to source new clients and use supporting tools to deliver their projects.
Foqus Technologies is an AI and quantitative imaging technology company that designs and develops software solutions to enhance the speed and quality of MRI scans.
Gryd Digital …
( 43
min )
submitted by /u/HEAL3D
[link] [comments]
( 41
min )
submitted by /u/rtwalz
[link] [comments]
( 42
min )
submitted by /u/Your_bad_sins
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/kg_from_ct
[link] [comments]
( 43
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
We have ancient biology, medieval institutions, and we are approaching godlike technology. There are so many nightmares that could play out and we have to be conscious of them at all times. Setting up AI systems correctly and ensuring that our rulers are responsible is the number one priority. But what happens if we do manage to retain control and agency?
If humanity can pull this off, then perhaps we can begin to imagine the incredible potential that awaits us. We are about to be the human beings that get to live through this incredible and most crucial period. What more incredible and meaningful time could there be, than getting to see and be a part of the potential transformation of our species?
https://youtu.be/TQ36hkxIx74
This video explores the concepts postulated by AI philosophers Nick Bostrom and Ray Kurzweil and entertains a cautious optimism about the future of humanity.
submitted by /u/Allisblissallislife
[link] [comments]
( 44
min )
submitted by /u/much_successes
[link] [comments]
( 41
min )
submitted by /u/Interesting-Tip5586
[link] [comments]
( 41
min )
submitted by /u/bendee983
[link] [comments]
( 41
min )
submitted by /u/MedicMoth
[link] [comments]
( 43
min )
submitted by /u/zalivom1s
[link] [comments]
( 41
min )
Model tuning is the experimental process of finding the optimal parameters and configurations for a machine learning (ML) model that result in the best possible desired outcome with a validation dataset. Single objective optimization with a performance metric is the most common approach for tuning ML models. However, in addition to predictive performance, there may […]
( 12
min )
https://www.legoscript.com/these-companies-are-replacing-workers-with-chatgpt-
submitted by /u/pyactee
[link] [comments]
( 41
min )
As computing and AI advancements spanning decades are enabling incredible opportunities for people and society, they’re also raising questions about responsible development and deployment. For example, the machine learning models powering AI systems may not perform the same for everyone or every condition, potentially leading to harms related to safety, reliability, and fairness. Single metrics […]
The post Responsible AI: The research collaboration behind new open-source tools offered by Microsoft appeared first on Microsoft Research.
( 13
min )
There are a lot of chatbot-based apps that are basically internet text generators with a bit of introductory stage-setting to nudge the interaction into "user talks to helpful chatbot" as opposed to literally any other dialog on the web. Not surprisingly, these are susceptible to a user resetting
( 5
min )
AI Weirdness: the strange side of machine learning
( 2
min )
From scaling mountains in the annual California Death Ride bike challenge to creating a low-cost, open-source ventilator in the early days of the COVID-19 pandemic, NVIDIA Chief Scientist Bill Dally is no stranger to accomplishing near-impossible feats. On Friday, he achieved another rare milestone: induction into the Silicon Valley Engineering Council’s Hall of Fame. The Read article >
( 5
min )
Telcos are seeking industry-standard solutions that can run 5G, AI applications and immersive graphics workloads on the same server — including for computer vision and the metaverse. To meet this need, NVIDIA is developing a new AI-on-5G solution that combines 5G vRAN, edge AI and digital twin workloads on an all-in-one, hyperconverged and GPU-accelerated system. Read article >
( 5
min )
submitted by /u/0ut0flin3
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Philo167
[link] [comments]
( 41
min )
submitted by /u/Interesting-Tip5586
[link] [comments]
( 41
min )
I created two AI ChatGPT Wizards that rap battle based on topics in the twitch chat.
https://www.twitch.tv/fleetyfleet
submitted by /u/fleetisme
[link] [comments]
( 41
min )
submitted by /u/Phishstixxx
[link] [comments]
( 41
min )
submitted by /u/hoky777
[link] [comments]
( 41
min )
submitted by /u/Peaking_AI
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
Artificial intelligence (AI) is one of the most discussed technologies nowadays. It can alter how we live and work, yet there are concerns about its societal impact. In this blog post, we will look at the benefits and drawbacks of artificial intelligence.
https://preview.redd.it/1lix1lb2xjka1.png?width=820&format=png&auto=webp&s=a148bd1ea2376ca824648354d944a79e472bc010
submitted by /u/Boce77
[link] [comments]
( 41
min )
submitted by /u/Boce77
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 41
min )
Before
Original Image: https://i.ibb.co/2t1XdZQ/13er.jpg (By Getty Images)
https://preview.redd.it/v9m5ghbnwika1.png?width=1024&format=png&auto=webp&s=10ed97df02d8ba11c83bc332347a253d51a4e6c5
After
Version 1: https://i.ibb.co/ZYqP1LB/1903163b-ed82-4676-b220-84d194557ac3.jpg
https://preview.redd.it/qdj8pd6pwika1.png?width=1126&format=png&auto=webp&s=bb71ef58277518bd8cd2e53f800dece9a28c8330
Version 2: https://i.ibb.co/phqQK2g/ca4b8237-7986-461d-bf4c-3c47427f2be3.png
https://preview.redd.it/jsi8al6wwika1.png?width=1134&format=png&auto=webp&s=c6ef82b46d26a8776e045dc53b7bc0e5b0f0ec7f
My Question
These look good to u guys? Please feel free to give me some feedback. Thanks!
submitted by /u/Jealous_Ad8132
[link] [comments]
( 41
min )
submitted by /u/wyem
[link] [comments]
( 41
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
submitted by /u/MysteryInc152
[link] [comments]
( 43
min )
submitted by /u/radi-cho
[link] [comments]
( 44
min )
submitted by /u/V1bicycle
[link] [comments]
( 41
min )
submitted by /u/Kiizmod0
[link] [comments]
( 43
min )
submitted by /u/CeFurkan
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/aizaz-zazii
[link] [comments]
( 41
min )
submitted by /u/cheekysalads123
[link] [comments]
( 41
min )
submitted by /u/regalalgorithm
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Opitmus_Prime
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 41
min )
submitted by /u/Philo167
[link] [comments]
( 41
min )
submitted by /u/shubhamorcapex
[link] [comments]
( 42
min )
submitted by /u/Zirius_Sadfaces
[link] [comments]
( 41
min )
submitted by /u/Quirky_Spirit_1951
[link] [comments]
( 43
min )
submitted by /u/NeonChat
[link] [comments]
( 42
min )
submitted by /u/Illustrious_Row_9971
[link] [comments]
( 43
min )
Hidden Markov Model implementation in R and Python for discrete and continuous observations. I have a tutorial on YouTube to explain about use and modeling of HMM and how to run these two packages.
Code:
https://github.com/manitadayon/CD_HMM (in R)
https://github.com/manitadayon/Auto_HMM (In Python)
Tutorial:
https://www.youtube.com/watch?v=1b-sd7gulFk&ab_channel=AIandMLFundamentals
https://www.youtube.com/watch?v=ieU8JFLRw2k&ab_channel=AIandMLFundamentals
submitted by /u/chess9145
[link] [comments]
( 43
min )
submitted by /u/taken_every_username
[link] [comments]
( 43
min )
submitted by /u/asdfsr125
[link] [comments]
( 44
min )
submitted by /u/radi-cho
[link] [comments]
( 44
min )
submitted by /u/radi-cho
[link] [comments]
( 44
min )
Hi, sorry for the likely to be dumb question.. I'm relatively new to these topics.
I have a file containing rows with variable length and a class (defined by value 0 or 1).
Is it possible (and it makes sense?) to use k-nearest neighbors classifier to classify variable input lenght data? the file is something like this: https://gist.github.com/edoardottt/46dd13c60408e95c1685ee88b5f6ace8
Thanks!
submitted by /u/edoardottt
[link] [comments]
( 45
min )
To design with AI models, user experience (UX) designers must assess the fit
between the model and user needs. Based on user research, they need to
contextualize the model's behavior and potential failures within their
product-specific data instances and user scenarios. However, our formative
interviews with ten UX professionals revealed that such a proactive discovery
of model limitations is challenging and time-intensive. Furthermore, designers
often lack technical knowledge of AI and accessible exploration tools, which
challenges their understanding of model capabilities and limitations. In this
work, we introduced a failure-driven design approach to AI, a workflow that
encourages designers to explore model behavior and failure patterns early in
the design process. The implementation of fAIlureNotes, a designer-centered
failure exploration and analysis tool, supports designers in evaluating models
and identifying failures across diverse user groups and scenarios. Our
evaluation with UX practitioners shows that fAIlureNotes outperforms today's
interactive model cards in assessing context-specific model performance.
( 2
min )
Knowledge tracing (KT) serves as a primary part of intelligent education
systems. Most current KTs either rely on expert judgments or only exploit a
single network structure, which affects the full expression of learning
features. To adequately mine features of students' learning process, Deep
Knowledge Tracing Based on Spatial and Temporal Deep Representation Learning
for Learning Performance Prediction (DKT-STDRL) is proposed in this paper.
DKT-STDRL extracts spatial features from students' learning history sequence,
and then further extracts temporal features to extract deeper hidden
information. Specifically, firstly, the DKT-STDRL model uses CNN to extract the
spatial feature information of students' exercise sequences. Then, the spatial
features are connected with the original students' exercise features as joint
learning features. Then, the joint features are input into the BiLSTM part.
Finally, the BiLSTM part extracts the temporal features from the joint learning
features to obtain the prediction information of whether the students answer
correctly at the next time step. Experiments on the public education datasets
ASSISTment2009, ASSISTment2015, Synthetic-5, ASSISTchall, and Statics2011 prove
that DKT-STDRL can achieve better prediction effects than DKT and CKT.
( 2
min )
Despite their growing popularity, data-driven models of real-world dynamical
systems require lots of data. However, due to sensing limitations as well as
privacy concerns, this data is not always available, especially in domains such
as energy. Pre-trained models using data gathered in similar contexts have
shown enormous potential in addressing these concerns: they can improve
predictive accuracy at a much lower observational data expense. Theoretically,
due to the risk posed by negative transfer, this improvement is however neither
uniform for all agents nor is it guaranteed. In this paper, using data from
several distributed energy resources, we investigate and report preliminary
findings on several key questions in this regard. First, we evaluate the
improvement in predictive accuracy due to pre-trained models, both with and
without fine-tuning. Subsequently, we consider the question of fairness: do
pre-trained models create equal improvements for heterogeneous agents, and how
does this translate to downstream utility? Answering these questions can help
enable improvements in the creation, fine-tuning, and adoption of such
pre-trained models.
( 2
min )
We propose a new supervised learning method for Variational AutoEncoder (VAE)
which has a causally disentangled representation and achieves the causally
disentangled generation (CDG) simultaneously. In this paper, CDG is defined as
a generative model able to decode an output precisely according to the causally
disentangled representation. We found that the supervised regularization of the
encoder is not enough to obtain a generative model with CDG. Consequently, we
explore sufficient and necessary conditions for the decoder and the causal
effect to achieve CDG. Moreover, we propose a generalized metric measuring how
a model is causally disentangled generative. Numerical results with the image
and tabular datasets corroborate our arguments.
( 2
min )
Our goal is to produce methods for observational causal inference that are
auditable, easy to troubleshoot, yield accurate treatment effect estimates, and
scalable to high-dimensional data. We describe an almost-exact matching
approach that achieves these goals by (i) learning a distance metric via
outcome modeling, (ii) creating matched groups using the distance metric, and
(iii) using the matched groups to estimate treatment effects. Our proposed
method uses variable importance measurements to construct a distance metric,
making it a flexible method that can be adapted to various applications.
Concentrating on the scalability of the problem in the number of potential
confounders, we operationalize our approach with LASSO. We derive performance
guarantees for settings where LASSO outcome modeling consistently identifies
all confounders (importantly without requiring the linear model to be correctly
specified). We also provide experimental results demonstrating the auditability
of matches, as well as extensions to more general nonparametric outcome
modeling.
( 2
min )
Deep learning approaches require collection of data on many different input
features or variables for accurate model training and prediction. Since data
collection on input features could be costly, it is crucial to reduce the cost
by selecting a subset of features and developing a budget-constrained model
(BCM). In this paper, we introduce an approach to eliminating less important
features for big data analysis using Deep Neural Networks (DNNs). Once a DNN
model has been developed, we identify the weak links and weak neurons, and
remove some input features to bring the model cost within a given budget. The
experimental results show our approach is feasible and supports user selection
of a suitable BCM within a given budget.
( 2
min )
Deep networks are susceptible to numerous types of adversarial attacks.
Certified defenses provide guarantees on a model's robustness, but most of
these defenses are restricted to a single attack type. In contrast, this paper
proposes feature partition aggregation (FPA) - a certified defense against a
union of attack types, namely evasion, backdoor, and poisoning attacks. We
specifically consider an $\ell_0$ or sparse attacker that arbitrarily controls
an unknown subset of the training and test features - even across all
instances. FPA generates robustness guarantees via an ensemble whose submodels
are trained on disjoint feature sets. Following existing certified sparse
defenses, we generalize FPA's guarantees to top-$k$ predictions. FPA
significantly outperforms state-of-the-art sparse defenses providing larger and
stronger robustness guarantees, while simultaneously being up to
5,000${\times}$ faster.
( 2
min )
Bernstein's condition is a key assumption that guarantees fast rates in
machine learning. For example, the Gibbs algorithm with prior $\pi$ has an
excess risk in $O(d_{\pi}/n)$, as opposed to the standard
$O(\sqrt{d_{\pi}/n})$, where $n$ denotes the number of observations and
$d_{\pi}$ is a complexity parameter which depends on the prior $\pi$. In this
paper, we examine the Gibbs algorithm in the context of meta-learning, i.e.,
when learning the prior $\pi$ from $T$ tasks (with $n$ observations each)
generated by a meta distribution. Our main result is that Bernstein's condition
always holds at the meta level, regardless of its validity at the observation
level. This implies that the additional cost to learn the Gibbs prior $\pi$,
which will reduce the term $d_\pi$ across tasks, is in $O(1/T)$, instead of the
expected $O(1/\sqrt{T})$. We further illustrate how this result improves on
standard rates in three different settings: discrete priors, Gaussian priors
and mixture of Gaussians priors.
( 2
min )
Deep learning is a crucial aspect of machine learning, but it also makes
these techniques vulnerable to adversarial examples, which can be seen in a
variety of applications. These examples can even be targeted at humans, leading
to the creation of false media, such as deepfakes, which are often used to
shape public opinion and damage the reputation of public figures. This article
will explore the concept of adversarial examples, which are comprised of
perturbations added to clean images or videos, and their ability to deceive DL
algorithms. The proposed approach achieved a precision value of accuracy of
76.2% on the DFDC dataset.
( 2
min )
Model parallelism is conventionally viewed as a method to scale a single
large deep learning model beyond the memory limits of a single device. In this
paper, we demonstrate that model parallelism can be additionally used for the
statistical multiplexing of multiple devices when serving multiple models, even
when a single model can fit into a single device. Our work reveals a
fundamental trade-off between the overhead introduced by model parallelism and
the opportunity to exploit statistical multiplexing to reduce serving latency
in the presence of bursty workloads. We explore the new trade-off space and
present a novel serving system, AlpaServe, that determines an efficient
strategy for placing and parallelizing collections of large deep learning
models across a distributed cluster. Evaluation results on production workloads
show that AlpaServe can process requests at up to 10x higher rates or 6x more
burstiness while staying within latency constraints for more than 99% of
requests.
( 2
min )
Explainable Artificial Intelligence (XAI) techniques are frequently required
by users in many AI systems with the goal of understanding complex models,
their associated predictions, and gaining trust. While suitable for some
specific tasks during development, their adoption by organisations to enhance
trust in machine learning systems has unintended consequences. In this paper we
discuss XAI's limitations in deployment and conclude that transparency
alongside with rigorous validation are better suited to gaining trust in AI
systems.
( 2
min )
Despite the popularity of low-rank matrix completion, the majority of its
theory has been developed under the assumption of random observation patterns,
whereas very little is known about the practically relevant case of non-random
patterns. Specifically, a fundamental yet largely open question is to describe
patterns that allow for unique or finitely many completions. This paper
provides two such families of patterns for any rank. A key to achieving this is
a novel formulation of low-rank matrix completion in terms of Plucker
coordinates, the latter a traditional tool in computer vision. This connection
is of potential significance to a wide family of matrix and subspace learning
problems with incomplete data.
( 2
min )
We study the statistical properties of learning to defer (L2D) to multiple
experts. In particular, we address the open problems of deriving a consistent
surrogate loss, confidence calibration, and principled ensembling of experts.
Firstly, we derive two consistent surrogates -- one based on a softmax
parameterization, the other on a one-vs-all (OvA) parameterization -- that are
analogous to the single expert losses proposed by Mozannar and Sontag (2020)
and Verma and Nalisnick (2022), respectively. We then study the frameworks'
ability to estimate P( m_j = y | x ), the probability that the jth expert will
correctly predict the label for x. Theory shows the softmax-based loss causes
mis-calibration to propagate between the estimates while the OvA-based loss
does not (though in practice, we find there are trade offs). Lastly, we propose
a conformal inference technique that chooses a subset of experts to query when
the system defers. We perform empirical validation on tasks for galaxy, skin
lesion, and hate speech classification.
( 2
min )
Randomly pivoted Cholesky (RPCholesky) is a natural algorithm for computing a
rank-k approximation of an N x N positive semidefinite (psd) matrix. RPCholesky
can be implemented with just a few lines of code. It requires only (k+1)N entry
evaluations and O(k^2 N) additional arithmetic operations. This paper offers
the first serious investigation of its experimental and theoretical behavior.
Empirically, RPCholesky matches or improves on the performance of alternative
algorithms for low-rank psd approximation. Furthermore, RPCholesky provably
achieves near-optimal approximation guarantees. The simplicity, effectiveness,
and robustness of this algorithm strongly support its use in scientific
computing and machine learning applications.
( 2
min )
Understanding when and how much a model gradient leaks information about the
training sample is an important question in privacy. In this paper, we present
a surprising result: even without training or memorizing the data, we can fully
reconstruct the training samples from a single gradient query at a randomly
chosen parameter value. We prove the identifiability of the training data under
mild conditions: with shallow or deep neural networks and a wide range of
activation functions. We also present a statistically and computationally
efficient algorithm based on tensor decomposition to reconstruct the training
data. As a provable attack that reveals sensitive training data, our findings
suggest potential severe threats to privacy, especially in federated learning.
( 2
min )
Bayesian Optimization is a useful tool for experiment design. Unfortunately,
the classical, sequential setting of Bayesian Optimization does not translate
well into laboratory experiments, for instance battery design, where
measurements may come from different sources and their evaluations may require
significant waiting times. Multi-fidelity Bayesian Optimization addresses the
setting with measurements from different sources. Asynchronous batch Bayesian
Optimization provides a framework to select new experiments before the results
of the prior experiments are revealed. This paper proposes an algorithm
combining multi-fidelity and asynchronous batch methods. We empirically study
the algorithm behavior, and show it can outperform single-fidelity batch
methods and multi-fidelity sequential methods. As an application, we consider
designing electrode materials for optimal performance in pouch cells using
experiments with coin cells to approximate battery performance.
( 2
min )
Bernstein's condition is a key assumption that guarantees fast rates in
machine learning. For example, the Gibbs algorithm with prior $\pi$ has an
excess risk in $O(d_{\pi}/n)$, as opposed to the standard
$O(\sqrt{d_{\pi}/n})$, where $n$ denotes the number of observations and
$d_{\pi}$ is a complexity parameter which depends on the prior $\pi$. In this
paper, we examine the Gibbs algorithm in the context of meta-learning, i.e.,
when learning the prior $\pi$ from $T$ tasks (with $n$ observations each)
generated by a meta distribution. Our main result is that Bernstein's condition
always holds at the meta level, regardless of its validity at the observation
level. This implies that the additional cost to learn the Gibbs prior $\pi$,
which will reduce the term $d_\pi$ across tasks, is in $O(1/T)$, instead of the
expected $O(1/\sqrt{T})$. We further illustrate how this result improves on
standard rates in three different settings: discrete priors, Gaussian priors
and mixture of Gaussians priors.
( 2
min )
We propose a new supervised learning method for Variational AutoEncoder (VAE)
which has a causally disentangled representation and achieves the causally
disentangled generation (CDG) simultaneously. In this paper, CDG is defined as
a generative model able to decode an output precisely according to the causally
disentangled representation. We found that the supervised regularization of the
encoder is not enough to obtain a generative model with CDG. Consequently, we
explore sufficient and necessary conditions for the decoder and the causal
effect to achieve CDG. Moreover, we propose a generalized metric measuring how
a model is causally disentangled generative. Numerical results with the image
and tabular datasets corroborate our arguments.
( 2
min )
submitted by /u/RushingRobotics_com
[link] [comments]
( 41
min )
submitted by /u/mothybot
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Alarming-Recipe2857
[link] [comments]
( 41
min )
submitted by /u/Steve____Stifler
[link] [comments]
( 41
min )
submitted by /u/Otarih
[link] [comments]
( 41
min )
submitted by /u/Mk_Makanaki
[link] [comments]
( 41
min )
Happy Friday! Register now for a webinar we have coming up next Tuesday at 12PM ET: Architectures for Running ML at the Edge, presented by ODSC! Registration is free, sign up here.
In this webinar, we will explore different paradigms for edge deployment of ML models, including federated learning, cloud-edge hybrid architectures, and standalone edge models. We will discuss the trade-offs and considerations for each, as well as best practices for designing and deploying ML models at the edge.
Tune in Tuesday Feb. 28 @ 12PM ET. Register here.
submitted by /u/modzykirsten
[link] [comments]
( 41
min )
submitted by /u/PuppetHere
[link] [comments]
( 41
min )
submitted by /u/Linkology
[link] [comments]
( 42
min )
Hi guys,
I have made a video on YouTube here where I explain what gradient boosting is and how it works.
I hope it may be of use to some of you out there. As always, feedback is more than welcomed! :)
submitted by /u/Personal-Trainer-541
[link] [comments]
( 41
min )
submitted by /u/Lumpek
[link] [comments]
( 41
min )
AI has the potential to revolutionize fraud detection by financial institutions, providing faster and more accurate detection of fraudulent activities. Here we present some ways in which AI can be used to detect and prevent fraud. https://youtu.be/luX9ecRwn_c
submitted by /u/eprepsg
[link] [comments]
( 41
min )
submitted by /u/awalias
[link] [comments]
( 41
min )
submitted by /u/ytcoinartist
[link] [comments]
( 41
min )
https://twitter.com/GuillaumeLample/status/1629151231800115202?t=4cLD6Ko2Ld9Y3EIU72-M2g&s=19
Paper here - https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/
submitted by /u/MysteryInc152
[link] [comments]
( 48
min )
Excited to share "Minds". A a new way to build backends and workflows entirely with AI (LLMs from OpenAI and Cohere). The AI can call your APIs, lookup in your database, etc.
With just a couple lines of code you can builds things like a question answering service where the AI can query your local database to help answer customer support queries etc.
https://github.com/dosco/minds
submitted by /u/gsvclass
[link] [comments]
( 43
min )
A recent podcast interview of EY's has gone a bit viral, and in it he claims that researchers have dismissed his views without seriously engaging with his arguments, which are described here in relative detail.
I'm aware of on-going AI safety and interpretability research, but the dual use of the term "AI safety" to mean something close to AI ethics, and something close to preventing an existential threat to humanity, makes distinguishing the goals of, say, Anthropic, and the extent to which they consider the latter a serious concern, difficult as a layperson.
I haven't personally found EY's arguments to be particularly rigorous, but I'm not the best suited person to evaluate their validity. Any thoughts are appreciated. Thanks in advance!
submitted by /u/SchmidhuberDidIt
[link] [comments]
( 44
min )
In this blog post we are discussing how to accelerate disaster response efforts using computer vision techniques for processing satellite imagery using AWS services.
( 8
min )
Amazon SageMaker multi-model endpoints (MMEs) provide a scalable and cost-effective way to deploy a large number of machine learning (ML) models. It gives you the ability to deploy multiple ML models in a single serving container behind a single endpoint. From there, SageMaker manages loading and unloading the models and scaling resources on your behalf […]
( 14
min )
Cloudy British weather is the butt of many jokes — but the United Kingdom’s national power grid is making the most of its sunshine. With the help of Open Climate Fix, a nonprofit product lab, the control room of the National Grid Electricity System Operator (ESO) is testing AI models that provide granular, near-term forecasts Read article >
( 6
min )
I am looking at OpenAI's implementation of SAC over here. Also, here is their code to compute the action and its log prob -
class SquashedGaussianMLPActor(nn.Module): def __init__(self, obs_dim, act_dim, hidden_sizes, activation, act_limit): super().__init__() self.net = mlp([obs_dim] + list(hidden_sizes), activation, activation) self.mu_layer = nn.Linear(hidden_sizes[-1], act_dim) self.log_std_layer = nn.Linear(hidden_sizes[-1], act_dim) self.act_limit = act_limit def forward(self, obs, deterministic=False, with_logprob=True): net_out = self.net(obs) mu = self.mu_layer(net_out) log_std = self.log_std_layer(net_out) log_std = torch.clamp(log_std, LOG_STD_MIN, LOG_STD_MAX) std = torch.exp(log_std) # Pre-squash distribution and sample pi_distribution = Normal(mu, std) if deterministic: # O…
( 45
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/kumarsaksham1891
[link] [comments]
( 41
min )
submitted by /u/CHRILLCAST
[link] [comments]
( 41
min )
submitted by /u/Phishstixxx
[link] [comments]
( 41
min )
submitted by /u/thedragod
[link] [comments]
( 41
min )
submitted by /u/tlokjock
[link] [comments]
( 41
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
submitted by /u/Alarming-Recipe2857
[link] [comments]
( 41
min )
submitted by /u/sopmac21379
[link] [comments]
( 41
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 43
min )
submitted by /u/DANGERD0OM
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 43
min )
submitted by /u/ytcoinartist
[link] [comments]
( 41
min )
"Hotter take: ML would have advanced faster if another front-end language had been available and widely adopted instead of Python. One that is interactive yet fast & compilable, multithreaded (no GIL), isn't bloated, doesn't care about white spaces,... E.g. Julia or some Lisp."
Link from the original tweet
submitted by /u/Marcapiel
[link] [comments]
( 60
min )
Over the last 10 years, a number of players have developed autonomous vehicle (AV) systems using deep neural networks (DNNs). These systems have evolved from simple rule-based systems to Advanced Driver Assistance Systems (ADAS) and fully autonomous vehicles. These systems require petabytes of data and thousands of compute units (vCPUs and GPUs) to train. This […]
( 11
min )
submitted by /u/yachay_ai
[link] [comments]
( 41
min )
https://www.legoscript.com/we-will-die-if-not-careful
submitted by /u/pyactee
[link] [comments]
( 44
min )
submitted by /u/gwern
[link] [comments]
( 42
min )
submitted by /u/thejashGI
[link] [comments]
( 40
min )
Discover the top 5 uses of UI/UX design in 2023. Engage your users, increase conversion rates, and boost ROI with better user experiences.
The post Maximizing Business Success with UI/UX Design: The Top 5 Advantages appeared first on Data Science Central.
( 20
min )
The do-it-yourself climate modeling movement is here. Researchers from Northwestern University and Argonne National Laboratory have been launching NVIDIA Jetson-driven edge computing Waggle devices across the globe to collect hyper-local climate information. Waggle is an open source sensor platform for edge computing developed by Argonne. Working with this, scientists share open-source AI code designed for Read article >
( 6
min )
A million developers across the globe are now using the NVIDIA Jetson platform for edge AI and robotics to build innovative technologies. Plus, more than 6,000 companies — a third of which are startups — have integrated the platform with their products. These milestones and more will be celebrated during the NVIDIA Jetson Edge AI Read article >
( 6
min )
To drive the automotive industry forward, NVIDIA and Mercedes-Benz are taking the virtual road. NVIDIA founder and CEO Jensen Huang joined Mercedes-Benz CEO Ola Källenius on stage at the automaker’s strategy update event yesterday in Silicon Valley, showcasing progress in their landmark partnership to digitalize the entire product lifecycle, plus the ownership and automated driving Read article >
( 6
min )
The cloud just got bigger. NVIDIA and Microsoft announced this week they’re working to bring top PC Xbox Game Studios games to the GeForce NOW library, including titles from Bethesda, Mojang Studios and Activision, pending closure of Microsoft’s acquisition. With six new games joining the cloud this week for members to stream, it’s a jam-packed Read article >
( 5
min )
submitted by /u/GodGivenRx
[link] [comments]
( 40
min )
submitted by /u/timothy-ventura
[link] [comments]
( 41
min )
submitted by /u/Moneyguy2323
[link] [comments]
( 47
min )
submitted by /u/theindianappguy
[link] [comments]
( 41
min )
submitted by /u/dcastm
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 40
min )
submitted by /u/Ziinxx
[link] [comments]
( 40
min )
submitted by /u/qptbook
[link] [comments]
( 40
min )
submitted by /u/TatianaW
[link] [comments]
( 41
min )
This post is co-written with Swagata Ashwani, Senior Data Scientist at Boomi. Boomi is an enterprise-level software as a service (SaaS) independent software vendor (ISV) that creates developer enablement tooling for software engineers. These tools integrate via API into Boomi’s core service offering. In this post, we discuss how Boomi used the bring-your-own-container (BYOC) approach […]
( 8
min )
"Deep learning is the only thing that currently works at scale it's the only class of algorithms that is able to discover arbitrary functions in a reasonable amount of time."
https://www.youtube.com/watch?v=p-OYPRhqRCg
I know of the universal approximation theorem. But is there any mathematical formulation of this statement?
submitted by /u/GraciousReformer
[link] [comments]
( 50
min )
submitted by /u/Ziinxx
[link] [comments]
( 40
min )
submitted by /u/auto_mata
[link] [comments]
( 41
min )
Laptops equipped with NVIDIA GeForce RTX 4070, 4060 and 4050 GPUs are now available. The new lineup — including NVIDIA Studio-validated laptops from ASUS, GIGABYTE and Samsung — gives creators more options to create from anywhere with lighter, thinner devices that dramatically exceed the performance of the last generation.
( 8
min )
submitted by /u/jamesj
[link] [comments]
( 41
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 41
min )
Similar to product explainer video like here: https://www.youtube.com/playlist?list=PL2P1Z-F3mmqxsMlpCp6wpeqAqlusiuZ_h
I've tried different services, but either the result was not good enough (e.g. Steve.ai has a "script to animation", but the result was very limited) or the service was not covering the case of script to video (e.g. https://www.synthesia.io/)
submitted by /u/muran123456
[link] [comments]
( 41
min )
submitted by /u/jaxsondeville
[link] [comments]
( 46
min )
submitted by /u/Machine_Minds
[link] [comments]
( 41
min )
submitted by /u/VausProd
[link] [comments]
( 41
min )
I have a lot of photos in my portfolio website and usually post them on social media by series like this example but want to find some new and creative ways to combine/curate photos in a different way which is visually appealing. And to come up with some new ideas outside of my own head I thought, maybe there is a tool that can help with ideas.
submitted by /u/Northlandscapes
[link] [comments]
( 41
min )
submitted by /u/TheRPGGamerMan
[link] [comments]
( 41
min )
submitted by /u/henlo_there_fren
[link] [comments]
( 41
min )
submitted by /u/Reinfeldx
[link] [comments]
( 41
min )
submitted by /u/BeefarmRich
[link] [comments]
( 41
min )
After you build, train, and evaluate your machine learning (ML) model to ensure it’s solving the intended business problem proposed, you want to deploy that model to enable decision-making in business operations. Models that support business-critical functions are deployed to a production environment where a model release strategy is put in place. Given the nature […]
( 15
min )
We’re thrilled to announce an expanded collaboration between AWS and Hugging Face to accelerate the training, fine-tuning, and deployment of large language and vision models used to create generative AI applications. Generative AI applications can perform a variety of tasks, including text summarization, answering questions, code generation, image creation, and writing essays and articles. AWS […]
( 4
min )
Announcements Data passivity and the current obsession with off-the-shelf chatbots Last September, Bill Schmarzo (“Point – Counterpoint on Why Organizations Suck at AI”) listed a few common excuses enterprises use to explain why they aren’t doing more with AI: We Don’t Have the Right Talent. “We can’t hire the right talent and don’t have bottomless budgets… Read More »DSC Weekly 21 February 2023 – Data Passivity and the Current Obsession with Off-The-Shelf Chatbots
The post DSC Weekly 21 February 2023 – Data Passivity and the Current Obsession with Off-The-Shelf Chatbots appeared first on Data Science Central.
( 20
min )
With every passing year, data analytics services are gaining more prominence as most enterprises are realizing the potential of data in driving important business decisions. The growing availability of data, developments in technology, and mounting demand for data-driven insights will contribute to this trend. Additionally, the upsurge of big data and cloud computing will make it easier… Read More »The Impact of AI-enabled Data Analytics Services Across Major Industries
The post The Impact of AI-enabled Data Analytics Services Across Major Industries appeared first on Data Science Central.
( 22
min )
Cybercriminals still attack startup businesses even though they may have smaller databases and less information to steal compared to the big players in the market. Why? Bad actors take the path of least resistance, and startups tend to be less equipped to defend against cyber attacks, spending an average of $500 or less on cybersecurity.… Read More »How to Build a Robust Cybersecurity Strategy for Your Startup
The post How to Build a Robust Cybersecurity Strategy for Your Startup appeared first on Data Science Central.
( 24
min )
The telecommunications industry has for decades helped advance revolutionary change – enabling everything from telephones and television to online streaming and self-driving cars. Yet the industry has long been considered an evolutionary mover in its own business. A recent survey of more than 400 telecommunications industry professionals from around the world found that same cautious Read article >
( 6
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )